Open varungupta31 opened 1 month ago
the same issue
For the time being,
Changing:
generated_ids = model.generate(**inputs, max_new_tokens=128)
To:
generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
i.e., greedy decoding
solved the issue, but I am not sure why this is happening; not everyone may want to do greedy decoding.
This 'fix' was quite obvious from the error stack trace that sampling is where something is going wrong.
I have been trying to fix this error for a while now, and the ongoing threads are of NO help.
I have checked these (and ALL issue on the HF community page for this model):
Minimal Working Example
As taken directly from the model card on Huggingface,
Error
I have tried :
model.eval()
modefloat16
and alsobfloat16
withflash_attention_2
pip install flash-attn --no-build-isolation
)So far I have not been able to get this model working.
Relevant Package Version
GPUs
Trying this on
RTX A6000
Kindly help,
Thanks.