Question: Training and inference resources, embedding outputs, LlamaV2

kochhar commented 1 year ago

Hello, thanks a lot for sharing your work here with the community. This is an incredible resource for us to work with.

I would like to use this model for a project I'm working on for a multi-modal virtual assistant and I wanted to understand a few details about the model to see if it can work.

What are the kind of GPU resources required for running inference using this model? Would it be possible to run inference on a single GPU instance with 24GB RAM or would this require additional resources?
If I wanted to fine-tune the model in a quantised low-bit environment using a dataset of image/text pairs, would it be possible to train on a single GPU with 24GB RAM?
Is it possible to have the model output the image and text embeddings before the final generation step?
Finally, is it possible to substitute LlamaV2 as the text model in Otter?

Thanks a lot for your attention!

Luodian commented 1 year ago

inference could be hosted at a 24GB GPU, actually 16GB is enough for bf16 mode.
it could be, but we didnt tested it. You could also try the smaller version with MPT1B as LLM. https://huggingface.co/luodian/OTTER-MPT1B-RPJama-Init. Just put it as the init model_path.
we have a Flamingo with llamav2 as LLM, but it does not perform well compared with our current MPT7B version.https://huggingface.co/luodian/Flamingo-Llama2-Chat7B-CC3M However, you could use it as the init model_path and finetune on it.

Luodian commented 1 year ago

For image embedding, you could hack with this line: https://github.com/Luodian/Otter/blob/9b34a4467581869c67dae7ea2b970f8e6b201d3c/otter/modeling_otter.py#L732 As for text embedding, it's in the forward function inside the modeling_llama.py or modeling_mpt.py, you could see modeling_llama.py at transformers src files, you could see modeling_mpt.py as our repo.

Luodian / Otter

Question: Training and inference resources, embedding outputs, LlamaV2 #233