Open wesleysanjose opened 1 year ago
ransformer.h.21.input_layernorm.bias -> layers.21.attention_norm.bias layers.21.attention_norm.bias 1 (4096,) transformer.h.21.self_attention.query_key_value.weight -> layers.21.attention.query_key_value.weight Killed
i have only 16gb mem so i tried to use local-memory parameter, model loaded and i see converting started, but in the end it says killed still. i see a 20G model file generated. is it considered success?
also i was trying to convert the finetuned bloom model, (https://huggingface.co/BelleGroup/BELLE-7B-2M/tree/main). it was finetuned on 7B but looks like it was fp32 instead of fp16 so it's double sized. do i need to supply any additional param when trying to convert it to ggml? reason is after the conversion, the result becomes non-sense and weird chars.
or should i use their gptq 8bit quantized model to convert?