NolanoOrg / cformers

SoTA Transformers with C-backend for fast inference on your CPU.
MIT License
311 stars 29 forks source link

Response is being stopped before finishing #40

Open vashat opened 1 year ago

vashat commented 1 year ago

Hi!

It seams that long results are cut of before finished. Using pythia model and have set number of tokens to a high number, but it does not seam to help. Any ideas if I'm doing something wrong? Here are the parameters used:

argv[6] = --seed
argv[7] = 42
argv[8] = --threads
argv[9] = 4
argv[10] = --n_predict
argv[11] = 2048
argv[12] = --top_k
argv[13] = 20
argv[14] = --top_p
argv[15] = 0.95
argv[16] = --temp
argv[17] = 0.85
argv[18] = --repeat_last_n
argv[19] = 64
argv[20] = --repeat_penalty
argv[21] = 1.3
main: seed = 42
model_type: gptneox
gptneox_model_load: loading model from '/Users/admin/.cformers/models/OpenAssistant/oasst-sft-1-pythia-12b/int4_fixed_zero' - please wait ...
gptneox_model_load: n_vocab = 50288
gptneox_model_load: n_ctx   = 512
gptneox_model_load: n_embd  = 5120
gptneox_model_load: n_head  = 40
gptneox_model_load: n_layer = 36
gptneox_model_load: n_rot   = 32
gptneox_model_load: use_parallel_residual = 1
gptneox_model_load: f16     = 2
gptneox_model_load: ggml ctx size = 7786.26 MB
gptneox_model_load: memory_size =   720.00 MB, n_mem = 18432
gptneox_model_load: ........................................................................ done
gptneox_model_load: model size =  7066.11 MB / num tensors = 580