Closed yapweiyih closed 1 year ago
I found the reason, you need to set the max_tokens
parameter.
djl_model = DJLModel(
"EleutherAI/gpt-j-6b",
"my_sagemaker_role",
dtype="fp16",
task="text-generation",
number_of_partitions=4,
max_tokens=2048,
)
Describe the bug A clear and concise description of what the bug is.
To reproduce Code to reproduce:
Deploy model first:
1) When input token is less than 1024 (OK)
Output:
2) When input token is > 1024 (ERROR)
Output:
Cloudwatch:
Expected behavior The model should respect the parameter
max_length=2048
, instead of 1024.Screenshots or logs If applicable, add screenshots or logs to help explain your problem.
System information A description of your system. Please provide:
Additional context Add any other context about the problem here.