trainium Search Results

awslabs/data-on-eks #533

Incorrect command to provide Linux permission on the AWS Tra…

## Description Hi, there is a typo in the following commands on the Blueprint for **AWS Trainium on EKS**: - https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/trainium The wrong comm…

AbrahamArellano updated 1 month ago

open-mmlab/mmengine #777

[Feature] Trainium support

### What is the feature? Support mmlab training on the AWS Trainium device ### Any other context? - AWS [recently announced general availability of Trainium instances](https://aws.amazon.com/…

austinmw updated 1 year ago

ray-project/ray #45035

[Ray component: Train] Ray can integrate with Lightning or X…

### Description Based on testing, Ray works well with AWS Trainium and Torch and Lightning and Trainium integrate well, but there is no possible integration with all three pieces. What would be nee…

BrianF-tessera updated 1 month ago

huggingface/optimum-neuron #485

More than 1 Trainium Instance

### System Info ```shell I found that I couldn't train on more than 1 trainium instance with optimum Neuron. However, if I comment out the code related to the neuroncache, then it seems to work. …

mathephysicist updated 3 months ago

ray-project/ray #33504

[Train] Ray Train should support AWS trainium instances

### Description I would like AWS trainium instances requiring "xla" torch backend be supported with ray. [https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/t…

gilvikra updated 1 month ago

huggingface/optimum-neuron #415

Unit tests are not correctly parsing NEURON_CC_FLAGS environ…

When running trainium unit test with command pytest -m "is_trainium_test", I got an issue that the neuron compiler flags set by os.environ were not picked up by the compiler ([link to example test](ht…

aws-tianquaw updated 3 weeks ago

aws-neuron/aws-neuron-sdk #854

Issue on page /general/faq/training/neuron-training.html

There is a typo in the section "[Can I use CUDA libraries with AWS Trainium?](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/faq/training/neuron-training.html#id8)" If you have appli…

alexaprize-osman updated 1 month ago

huggingface/optimum-neuron #341

TRL support for SFT / RM / PPO

Does Optimum Neuron have support for [TRL](https://huggingface.co/docs/trl/index) supervised fine-tuning, reward modelling, and PPO using Trainium? Is TRL the best path to support RLHF?

5cp updated 3 weeks ago

huggingface/optimum-neuron #386

[discussion] Should be push private model neffs to the train…

When you try to push NEFFs from compiled model with a private origin model it will fail since we have a safeguard in place. This might be suboptimal, e.g. we cannot cache llama. cc @michaelbenayou…

philschmid updated 3 weeks ago

NVIDIA/Megatron-LM #790

When can we have a the MOE checkpoint convert script.

As mentioned [here](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/transformer/moe#ease-of-use), having a proper MOE/Mixtral checkpoint converter script will help us to fine-tune Mixtra…

shamanez updated 1 week ago

89 results for trainium

89 results
for trainium