PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.91k stars 121 forks source link

Wrong depedancies, why deepspeed dependency for inference, better transformers integration #35

Open sujitvasanth opened 7 months ago

sujitvasanth commented 7 months ago

Hi - thanks for a great repo and model! got it working today but seems to be many unneccessary hurdles:

  1. minimum python version is wrong - I have run natively in python 3.8.10 please correct the readme..
  2. why is there dependency on deepspeed in the inference code? model does not need deepspeed for inference.
  3. perhaps I misuderstand but torch weights is loaded as part of the inference even after the softtensors - all other multimodal models directl load the tensor..if has to be done this way wher is the .bin model located so I can save it locally rather than store in cache?
  4. rather than have to clone the repo, in hugging face repo you can change trust_loal_code=True, vikhyat has done this for his custom model see here https://github.com/vikhyat/moondream/issues/18
  5. running inference.py directly works exactly the same as deepspeed inference.py do why run it this way?
  6. also regarding quantisation - seems to be a fault with tranformers/acce/erate/deepspeed integration and versioning so suspect wont be able to utilise quantisation until those libraries have corrected things
LinB203 commented 7 months ago
  1. We do not test the lower python version, just following LLaVA using python-3.10.
  2. Why does not model need deepspeed for inference? Have you test it and work well? I use deepspeed because the MoE layers needs deepspeed's initilization.
  3. Saving checkpoint pipeline is following LLaVA.
  4. Because we need modify some lines to adapter MoE layer to foundation LLM.
  5. Same as 2.
  6. We also trouble in it....
sujitvasanth commented 7 months ago

thanks @LinB203 you answer all the questions quickly. thankyou!

  1. Please consider changing your readme to tested on python 3.10 rather than requirement
  2. Yes I think you are right - I tried direct inference on hugging face and failed. But ultimately it is just a model - cant see what cant run inference outside of deeospeed..
  3. other Llava based models directly combine the tensosors see here https://huggingface.co/YouLiXiya
  4. see the moonmdream1 model which is a customised model - the custom code can lay within the model repo so you dont have to clone the github repo and then will work on hugging face
LinB203 commented 7 months ago

For 1 & 4, we will consider your suggestions! Thanks. For 3, that is strange for me, so I need take time to test.