Closed babla9 closed 5 months ago
Hi, is your V100 32G or 16G of GPU memory, 32G is fine with the current code, you may need to compress the image input to about 500 pixels wide and high. If it's 16G, then you may need to fine-tune it with the int4 version and wait for the Lora-finetuning code to be released.
Thanks! Its a 16gb V100, would a larger cluster size help eg 8xV100? I've seen in other places that Llama3 deteriorates significantly with quantization, do you know if you'll be releasing any benchmark measurements for these new versions?
for 16GB, finetuning only llm parameters or gradient_accumulation_steps with only 1 bs or gradient checkpointing maybe needed, we do not have the environment to test it. You can have a try!
and for int4 version, We see a 1 percent loss of precision which is in a reasonable range.
Thanks for your work on this! What is the minimum GPU memory requirement for full parameter fine-tuning MiniCPM-Llama3-V 2.5? I cannot access A100s, would it be possible to run on 4xV100 or Nvidia T4 gpus?