Open SquidCubic opened 1 year ago
Same issue here using the 7B Llama2 model, any ideas?
I think the reason for the error is that the pdf document is not in English, so more tokens are used when tokenizing. I'll try changing to a multilingual model. Please check this video for details: https://www.youtube.com/watch?v=ThKWQcyQXF8
@SquidCubic llama2 is actually multilingual. Did you perhaps find a better fit?
I‘m using GPU with the model below: model_id = "TheBloke/Llama-2-13B-chat-GPTQ" model_basename = "gptq_model-4bit-128g.safetensors"
I use 10 pdf files of my own (100k-200k each) and can start the model correctly
However, when I enter my query, I always get the error below, no matter what questions I ask. Input length of input_ids is 3823, but
max_length
is set to 2048. This can lead to unexpected behavior. You should consider increasingmax_new_tokens
.Does anyone know why this is? Thank