Closed maxdebayser closed 4 months ago
I've tried the 1.3b and 6.7b deepseek models but they run without problems :thinking:
The problem happened with revision 6f09197224af9638c32c01a9060e78b0cf5a4479
of the model. With revision 61dc97b922b13995e7f83b7c8397701dbf9cfd4c
it doesn't happen. So it's not a tgis issue.
Describe the bug
When I run some of the benchmarks lm-eval with deepseek on tgis, it fails in two differente ways whereas on vLLM the same models runs to completion.
On tgis, with tgis_native engine and flash attention the generation fails on the server side and prints this error:
With the
hf_transformers
engine the generation doesn't fail in the server side but the input tokens returned to the client are wrong. It seems that the first token is replaced by a begin_of_sentence token: