aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
120 stars 32 forks source link

Llama3 8B 32K sample generates garbage #82

Open samir-souza opened 1 month ago

samir-souza commented 1 month ago

Model generates only garbage.

Sample: https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb

NeuronSDK2.19 PyTorch 1.13.1

aws-neuronx-runtime-discovery 2.9 libneuronxla 0.5.1795 neuronx-cc 2.14.213.0+013d129b neuronx-distributed 0.8.0 torch-neuronx 1.13.1.1.15.0 torch-xla 1.13.1+torchneuronf transformers-neuronx 0.11.351

ii aws-neuronx-collectives 2.21.46.0-69b77134b amd64 neuron_ccom built using CMake ii aws-neuronx-gpsimd-customop-lib 0.11.4.0 amd64 custom_op_trn1_install built using CMake ii aws-neuronx-gpsimd-tools 0.11.3.0-36dcb86d4 amd64 gpsimd_tools built using CMake ii aws-neuronx-runtime-lib 2.21.41.0-fb1705f5f amd64 neuron_runtime built using CMake ii aws-neuronx-tools 2.18.3.0 amd64 Neuron profile and debug tools

Example from the notebook generates: num_input_tokens: 26828 generated sequence 1. We propose a new gated linear recurrent unit (RG-LRU) that is efficient to compute on TPU-v3. 2. We propose Griffin, a hybrid model that mixes the RG-LRU with local attention. 3. Griffin and Hawk achieve comparable performance to Transformers on downstream tasks. 4. Griffin and Hawk extrapolate to longer sequences than Transformers. 5. Griffin and Hawk are more efficient than Transformers at inference. 6. Griffin and Hawk are efficient at copying and retrieval tasks. 7. Griffin and Hawk are efficient at training. 8. Griffin and Hawk are efficient at inference. 9. Griffin and Hawk are efficient at training. 10. Griffin and Hawk are efficient at inference. 11. Griffin and Hawk are efficient at training. 12. Griffin and Hawk are efficient at inference. 13. Griffin and Hawk are efficient at training. 14. Griffin and Hawk are efficient at inference. 15. Griffin and Hawk are efficient at training. 16. Griffin and Hawk are efficient at inference. 17. Griffin and Hawk are efficient at training. 18. Griffin and ..... and repeats the same thing for the rest of the 32K

Custom prompt <|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a json format specialist<|eot_id|><|start_header_id|>user<|end_header_id|>

{"a": invalid text, "b": how are you?}

Can you fix the given json document for me, please?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

Output generated sequence користувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористувачassistantкористув and repeats.

shubhamchandak94 commented 1 month ago

Issue is due to using f16 type instead of bf16 (which is what the model weights are in). We will update the tutorial in the next release