[QUESTION] Problems performing inference

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.23k stars 2.08k forks source link

[QUESTION] Problems performing inference #854

Closed srivassid closed 3 weeks ago

srivassid commented 3 weeks ago

I have trained a model for 800 iters, just for testing purposes, and i am trying to perform inference on it, but the server crashes.

I have hosted a server, but when i try tools/text_generation_cli.py localhost:5000 it asks me for a prompt but then the server crashes saying AttributeError: 'InferenceParams' object has no attribute 'max_sequence_len'

Can anyone help me out?

Thanks

srivassid commented 3 weeks ago

Updating transformer engine within docker image and creating a new docker image with the changes solved the issue. Docker image pytorch-24.01 contains transformer engine 1.2.0, whereas the latest stable TE is 1.6.0