-
## Description
Hi, there is a typo in the following commands on the Blueprint for **AWS Trainium on EKS**:
- https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/trainium
The wrong comm…
-
### What is the feature?
Support mmlab training on the AWS Trainium device
### Any other context?
- AWS [recently announced general availability of Trainium instances](https://aws.amazon.com/…
-
### Description
Based on testing, Ray works well with AWS Trainium and Torch and Lightning and Trainium integrate well, but there is no possible integration with all three pieces. What would be nee…
-
### System Info
```shell
I found that I couldn't train on more than 1 trainium instance with optimum Neuron. However, if I comment out the code related to the neuroncache, then it seems to work.
…
-
### Description
I would like AWS trainium instances requiring "xla" torch backend be supported with ray.
[https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/t…
-
When running trainium unit test with command pytest -m "is_trainium_test", I got an issue that the neuron compiler flags set by os.environ were not picked up by the compiler ([link to example test](ht…
-
There is a typo in the section "[Can I use CUDA libraries with AWS Trainium?](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/faq/training/neuron-training.html#id8)"
If you have appli…
-
Does Optimum Neuron have support for [TRL](https://huggingface.co/docs/trl/index) supervised fine-tuning, reward modelling, and PPO using Trainium? Is TRL the best path to support RLHF?
-
When you try to push NEFFs from compiled model with a private origin model it will fail since we have a safeguard in place. This might be suboptimal, e.g. we cannot cache llama.
cc @michaelbenayou…
-
As mentioned [here](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/transformer/moe#ease-of-use), having a proper MOE/Mixtral checkpoint converter script will help us to fine-tune Mixtra…