Open greg-8090 opened 3 days ago
Hi @greg-8090 ,
If you add "--model-type=transformer" compiler flag like below, you will be able to get further:
import os
os.environ["NEURON_CC_FLAGS"] = "--model-type=transformer"
However, I see that this example is encountering the compiler error below:
2024-10-07T21:55:32Z [XTP003] Number of instructions (10046893) is over the threshold (5000000). Operators are too large. - Make operators smaller i.e. smaller batch size or sequence length, or use tensor parellelism.
So you can switch to a smaller example like BERT-large, or try a NxD inference tutorial.
I have an environment that supports both torch-neuronx and torch-xla. When I do simple utility calls I can confirm that the XLA device is being detected and is accurately identified as Neuron. However, when I run a simple inference example using torch-xla, the neuron-cc step never completes, even if running overnight. I have attached the sample code in question.
simple_inference.py.txt