aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
461 stars 154 forks source link

Does it make static graph, when compilation? #1032

Open clownchrys opened 2 days ago

clownchrys commented 2 days ago

Hi!

I am trying to use transformer-neuronx to compile the customized huggingface llama-3.1-8b model. I use the model with beam search, and I know that it makes dynamic graph during generation. But, if I compile the model with using neuron-sdk, does it make static graph by tracing?

Can I still use beam search, after neuron compilation?

There can be something wrong in my knowledge. If so, please fix them.

Thanks for your effort.

devesr-amzn commented 1 day ago

Thank you for the question. It should be possible to set this parameter in the Configuration(GenerationConfig) to True. This ensures that the graph which gets traced, is static, and supports sampling on device.

When not sampling on device, it should be possible to use beam search after compilation, if the model remains identical.