deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
189 stars 63 forks source link

Token accuracy not as expected for starcoderbase-15b model with rolling batch type vllm #1720

Open sreka opened 4 months ago

sreka commented 4 months ago

Description

Getting Junk values and number of tokens generated less for starcoderbase model with rolling batch type vllm And also accuracy of generated text is also low.

Expected Behavior

Number of Tokens need to be generated as expected and accuracy also need to be good .

How to Reproduce?

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

Serving.properties

engine=Python option.entryPoint=djl_python.huggingface option.rolling_batch=vllm option.max_rolling_batch_size=32 option.output_formatter=json option.tensor_parallel_degree = 1

Dataset

Tested using code-contests datasets

sindhuvahinis commented 2 months ago

Hi sreka! Thanks for raising the issue. We are testing startcoder2-7b regularly on our nightly CI and it seems fine. Are you still facing this issue with our latest container as well?