TRI-ML / prismatic-vlms

A flexible and efficient codebase for training visually-conditioned language models (VLMs)
MIT License
327 stars 93 forks source link

Adding support for other base LLMs #6

Closed shikhar-srivastava closed 2 months ago

shikhar-srivastava commented 4 months ago

Hi! Thanks for your great work and a great library to build on top of!

Have you guys considered support for other base LLMs? An example addition (say for a smaller base LLM like Phi 2 or Gemma) might be useful as reference. (I know from working with MiniGPT4. that adding support for Phi2 vs Pythia models for example, looks very different)

I'm happy to build the same up, if you have any suggestions on how to design it?

Thanks again for your great work!

TobiasLee commented 4 months ago

i also wonder whether the current training could support multi-node training for larger LLMs? Any example code would be helpful.

zeyuanyin commented 4 months ago

The same expected support for other LLMs or an example replacement LLM part.

siddk commented 3 months ago

Hey folks - my apologies for being slow to respond the last few weeks (COVID); I'll be adding an example for Mistral and Phi-2 within the week!

@TobiasLee - the code supports multi-node training out of the box via torchrun (see here); let me know if you have difficulties getting things to run!

zeyuanyin commented 3 months ago

Hey folks - my apologies for being slow to respond the last few weeks (COVID); I'll be adding an example for Mistral and Phi-2 within the week!

@TobiasLee - the code supports multi-node training out of the box via torchrun (see here); let me know if you have difficulties getting things to run!

Hello @siddk , thanks for your great work. I am writing to ask about the release program for other LLM. I have implemented the replacement example for Phi-1.5. However, I met some problems in the evaluation using vlm-evaluation, the model output consists of the answers and unrelated long sentences (attached below), leading to all ZERO evaluation scores. So, I'm looking forward to your upcoming release examples to make some corrections to my code. And I'm willing to share my implementation, if you are available to have a look at my code or provide some suggestions, I would deeply appreciate it. Thanks a lot.

The current wrong output is

{
  "422700016": {
    "question_id": 422700016,
    "question": "What is the boy listening to?",
    "model_output": "Mother\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.",
    "ground_truth_answer": "parents"
  },
  "78838001": {
    "question_id": 78838001,
    "question": "Did the majority of players get dropped off by their parents?",
    "model_output": "No\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.",
    "ground_truth_answer": "no"
  },
  "375409001": {
    "question_id": 375409001,
    "question": "How many road cones are in the picture?",
    "model_output": "2\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.\n\nThe fireman was able to put out the fire quickly because he had the right equipment.",
    "ground_truth_answer": "0"
  },

It should be like

{
  "422700016": {
    "question_id": 422700016,
    "question": "What is the boy listening to?",
    "model_output": "Mother",
    "ground_truth_answer": "parents"
  },
  "78838001": {
    "question_id": 78838001,
    "question": "Did the majority of players get dropped off by their parents?",
    "model_output": "No",
    "ground_truth_answer": "no"
  },
  "375409001": {
    "question_id": 375409001,
    "question": "How many road cones are in the picture?",
    "model_output": "2",
    "ground_truth_answer": "0"
  },
shikhar-srivastava commented 3 months ago

Hi @siddk @siddk-tri, Would be good to hear about the Mistral/Phi-2 example.

Hey folks - my apologies for being slow to respond the last few weeks (COVID); I'll be adding an example for Mistral and Phi-2 within the week!

@TobiasLee - the code supports multi-node training out of the box via torchrun (see here); let me know if you have difficulties getting things to run!

@zeyuanyin Could you share the Phi 1.5 LLM replacement code you've tried? We can debug the error you're seeing. I'm not sure I see the same error.

Hello @siddk , thanks for your great work. I am writing to ask about the release program for other LLM. I have implemented the replacement example for Phi-1.5. However, I met some problems in the evaluation using vlm-evaluation, the model output consists of the answers and unrelated long sentences (attached below), leading to all ZERO evaluation scores. So, I'm looking forward to your upcoming release examples to make some corrections to my code. And I'm willing to share my implementation, if you are available to have a look at my code or provide some suggestions, I would deeply appreciate it. Thanks a lot.

zeyuanyin commented 3 months ago

@zeyuanyin Could you share the Phi 1.5 LLM replacement code you've tried? We can debug the error you're seeing. I'm not sure I see the same error.

Hi, @shikhar-srivastava and @siddk. Thanks in advance for your help. You can find my code at https://github.com/zeyuanyin/prismatic-vlms-phi-dev. The current progress is that I changed the LLM from Phi-1.5 to Phi-2, and the fine-tuned VLM model can generate the correct form answer without unrelated long sentences. But the evaluation results are pretty low, especially on text-vqa. vqa-v2 gqa vizwiz text-vqa-ocr text-vqa
phi-v2-2.7b (slim eval set) 51.62 42.48 16.76 0.331 0.117
siddk commented 2 months ago

Hey all -- I'm so sorry for the extended delay. I just made a PR demonstrating how to add various new LLM backbones (Mistral, Llama-2 Chat, Phi-2), with full evaluation results.

In general, my Phi-2 results (no LoRA, full-ft) are pretty bad (but on par with @zeyuanyin). This could be just because Phi-2 isn't that great of a model... but would definitely love it if y'all could double-check my PR!

Hoping that this PR can also serve as a template for adding new LLMs. Closing this issue for now, but please feel free to post new issues with specific concerns!