SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.96k stars 412 forks source link

How to obtain 'predictor weights'? #116

Open harikrishnaapc opened 10 months ago

harikrishnaapc commented 10 months ago

I have a fine tuned vicuna 7B model, i tried to convert into PowerInfer with 'LLaMA(ReLU)-2-7B' predictor, but the inference is not right? Is this because of a different predictor used rather than that of fine-tuned model one? How to obtain these weights?

I Todo section i see 'Release core code of PowerInfer, supporting Llama-2, Falcon-40B.' is marked as done.

Can we use PowerInfer for fine-tuned vicuna/ llama models?

Thanks

Prerequisites

Before submitting your question, please ensure the following:

Question Details

Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.

Additional Context

Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.

krsh-37 commented 10 months ago

Hi Team,

Read through the paper as well, great work.

  1. I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
  2. How much data % of train data is suggested for DejaVu 'Predictors' finding?
  3. How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?

Thanks

YixinSong-e commented 9 months ago

Hi Team,

Read through the paper as well, great work.

  1. I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
  2. How much data % of train data is suggested for DejaVu 'Predictors' finding?
  3. How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?

Thanks

Hello, thank you for your interest.

  1. Yes, when we have enough space in VRAM, we will fall back to Deja Vu.But currently, our code has not been optimized for complete offloading, and we will support this feature.
  2. Actually I use 1M data point for predictor training.
  3. For training predictors, we will open source a tool. At present, you can refer to the implementation of predictor training in Dejavu.
YixinSong-e commented 9 months ago

I have a fine tuned vicuna 7B model, i tried to convert into PowerInfer with 'LLaMA(ReLU)-2-7B' predictor, but the inference is not right? Is this because of a different predictor used rather than that of fine-tuned model one? How to obtain these weights?

I Todo section i see 'Release core code of PowerInfer, supporting Llama-2, Falcon-40B.' is marked as done.

Can we use PowerInfer for fine-tuned vicuna/ llama models?

Thanks

Prerequisites

Before submitting your question, please ensure the following:

  • [x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
  • [x] I have carefully read and followed the instructions in the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.

Additional Context

Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.

Thank you for your interest. First, for now, we just support ReLU-based model. And every model has its own predictor. For now we do not support fine-tuned vicuna/ llama models because they are not ReLU-based models. By the way, we will release mistral-based model in the future. And we will SFT and DPO finetune this model.

jet-yangqs commented 6 months ago

Hi Team, Read through the paper as well, great work.

  1. I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
  2. How much data % of train data is suggested for DejaVu 'Predictors' finding?
  3. How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?

Thanks

Hello, thank you for your interest.

  1. Yes, when we have enough space in VRAM, we will fall back to Deja Vu.But currently, our code has not been optimized for complete offloading, and we will support this feature.
  2. Actually I use 1M data point for predictor training.
  3. For training predictors, we will open source a tool. At present, you can refer to the implementation of predictor training in Dejavu.

Dear Team,

I hope you're doing well. I'm following up on the discussion about the optimization for complete offloading and the fallback to Deja Vu.

Could you kindly provide any updates on the progress of this feature?

Thank you for your time.