NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.
MIT License
311 stars 29 forks source link

Converted codegen-16 Model but got error using it with inference. #33

Open prof-schacht opened 1 year ago

prof-schacht commented 1 year ago

Hi,

I converted the codgen-16b model by using the following code: python3 convert_gptj_to_ggml.py sourceforge/codgen-16b ./codgen-16b 0 './quantize_gptj ./codgen-16b/cogen-16b.bin 1'

Inference I used the following command: ./main gptj -m converters/codegen-16b/codgen16b-q4.bin --prompt "def palindrom(word):" -t 8

But I got the following error:

gptj_model_load: loading model from 'converters/codegen-16b/codgen16b-q4.bin' - please wait ... gptj_model_load: valid model file 'converters/codegen-16b/codgen16b-q4.bin' (good magic) gptj_model_load: n_vocab = 51200 gptj_model_load: n_ctx = 512 gptj_model_load: n_embd = 6144 gptj_model_load: n_head = 24 gptj_model_load: n_layer = 34 gptj_model_load: n_rot = 64 gptj_model_load: f16 = 2 gptj_model_load: ggml ctx size = 10376.90 MB gptj_model_load: memory_size = 816.00 MB, n_mem = 17408 gptj_model_load: ........................................... done gptj_model_load: model size = 9560.82 MB / num tensors = 345 libc++abi: terminating with uncaught exception of type std::invalid_argument: stoi: no conversion zsh: abort ./main gptj -m converters/codegen-16b/codgen16b-q4.bin --prompt -t 8

Any Ideas?

HCBlackFox commented 1 year ago

That is not worked like that you need to use interface to convert your promt from string to int.

https://github.com/NolanoOrg/cformers#usage

Ayushk4 commented 1 year ago

Try doing this instead.

  1. Move your codegen model to lookup path: mv converters/codegen-16b/codgen16b-q4.bin ~/.cformers/models/Salesforce/codegen-16B-mono/int4_fixed_zero, you may need to make the directories before moving though.

  2. Run following code in python:

    from interface import AutoInference as AI
    ai = AI('Salesforce/codegen-16B-mono')
    x = ai.generate('def palindrom(word):', num_tokens_to_generate=500)
    print(x['token_str'])