[Feature Request] Need to support auto-regressive VLMs

htlou commented 4 months ago

Required prerequisites

[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[X] Consider asking first in a Discussion.

Motivation

As current VLM support mainly focuses on encoder-decoder style models (which encode and decode multimodal information as a hidden state tensor), we need to support auto-regressive VLMs including Chameleon & Anole (which encode and decode multimodal information as tokens).

Solution

No response

Alternatives

No response

Additional context

No response

htlou commented 4 months ago

Is now working on https://github.com/PKU-Alignment/align-anything/pull/36

htlou commented 3 months ago

now https://github.com/PKU-Alignment/align-anything/pull/36 is merged, the SFT support is done

PKU-Alignment / align-anything

[Feature Request] Need to support auto-regressive VLMs #13

Required prerequisites

Motivation

Solution

Alternatives

Additional context