Open clownchrys opened 2 days ago
Thank you for the question. It should be possible to set this parameter in the Configuration(GenerationConfig
) to True
. This ensures that the graph which gets traced, is static, and supports sampling on device.
When not sampling on device, it should be possible to use beam search after compilation, if the model remains identical.
Hi!
I am trying to use transformer-neuronx to compile the customized huggingface llama-3.1-8b model. I use the model with beam search, and I know that it makes dynamic graph during generation. But, if I compile the model with using neuron-sdk, does it make static graph by tracing?
Can I still use beam search, after neuron compilation?
There can be something wrong in my knowledge. If so, please fix them.
Thanks for your effort.