aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
452 stars 152 forks source link

Missing example in the doc for speculative decoding beta support #921

Closed JingyaHuang closed 3 months ago

JingyaHuang commented 3 months ago

Hi team,

Optimum Neuron is looking into adding speculative decoding support for some seq2seq models. There seems to be an example from the Annapurna team but the link to the resource is missing. Could the team point us to the example? Thx

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/neuronx_distributed_inference_developer_guide.html?highlight=speculative#speculative-decoding-beta

For a complete example, please refer to this [file].

(no link attached to the "file".)

JingyaHuang commented 3 months ago

got it: https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/inference/run_llama_speculative.py, thanks team.