Custom Finetuned Models?

NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.

MIT License

311 stars 29 forks source link

Custom Finetuned Models? #32

Open mallorbc opened 1 year ago

mallorbc commented 1 year ago

Looking at the code, it appears that when loading a model the code is loading preprocessed models that are uploaded to Huggingface and then checks that sha256 to make sure that it matches. It does not seem that the code currently allows loading a model from a path.

I have converted GPTJ myself into the GGML format(which I am almost certain this is built from, correct me if wrong).

I am interested in finetuning a model, converting the model to GGML, quantize to 4 bits, and then using the model through an API.

I believe that should this repo support custom fine-tuned models, it would be great for this use case.

Ayushk4 commented 1 year ago

Hi, You should be able to convert your custom GPTJ based models.

If you want to add new models, then follow these steps:

Run https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/converters/convert_gptj_to_ggml.py to convert codegen into ggml gptj's format: python3 https://convert_gptj_to_ggml.py/ [HF_Model_URL_or_Local_Path] [GPTJ_Save_path] 0
Run https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/quantize_gptj.cpp make, then ./quantize_gptj [GPTJ_Save_path] [GPTJ_Int4_Save_path] 2
Upload [GPTJ_Int4_Save_path] model to HF and add an entry like the following: https://github.com/NolanoOrg/cformers/blob/2746a62b76fb4ee37966f96462e2f08bbaf14552/cformers/interface.py#L72

After these steps, you should be able to load the model via python.

mallorbc commented 1 year ago

I do not want to upload the model to HuggingFace due to the nature of the models being private(although perhaps one can have private models on HuggingFace?).

However, if the method you stated would work, I believe that I could add support for loading models locally. I will likely explore adding this feature.

Thanks for your insight.

Ayushk4 commented 1 year ago

Yes. If you don't want to upload, a hack would be to put the int4_fixed_zero at ~/.cformers/models/myUserName/myModel/int4_fixed_zero

Then you can do use it by:

from interface import AutoInference as AI
ai = AI('myUserName/myModel')
x = ai.generate('Some Prompt', num_tokens_to_generate=500)
print(x['token_str'])

For example myUserName/myModel can be EleutherAI/gpt-j-6B for the GPTJ model.

sann3 commented 1 year ago

Is same work for hivemind/gpt-j-6B-8bit model ?

Ayushk4 commented 1 year ago

You could load it at 4-bit. For now 8-bit isn't supported.

mallorbc commented 1 year ago

My PR #38 should close this issue. Will close when merged