aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
447 stars 149 forks source link

neuron-cc never finishes when using simple torch-xla example #1003

Open greg-8090 opened 3 days ago

greg-8090 commented 3 days ago

I have an environment that supports both torch-neuronx and torch-xla. When I do simple utility calls I can confirm that the XLA device is being detected and is accurately identified as Neuron. However, when I run a simple inference example using torch-xla, the neuron-cc step never completes, even if running overnight. I have attached the sample code in question.

simple_inference.py.txt

jeffhataws commented 3 days ago

Hi @greg-8090 ,

If you add "--model-type=transformer" compiler flag like below, you will be able to get further:

import os
os.environ["NEURON_CC_FLAGS"] = "--model-type=transformer"

However, I see that this example is encountering the compiler error below:

2024-10-07T21:55:32Z [XTP003] Number of instructions (10046893) is over the threshold (5000000). Operators are too large. - Make operators smaller i.e. smaller batch size or sequence length, or use tensor parellelism.

So you can switch to a smaller example like BERT-large, or try a NxD inference tutorial.