Closed xanderdunn closed 1 year ago
Thanks for reaching out - we're taking a look!
@xanderdunn - this is a bug in neuronx-cc related to the iota operator. We’re working on a fix and will keep this ticket updated as we make progress.
@xanderdunn - we believe your reported issue is fixed in the latest Neuron SDK (2.11). Please give it a try and update/close this ticket as appropriate.
Confirmed, after upgrading to Neuron SDK 2.11 this compiled for me!
$ neuronx-cc compile /tmp/rust_hlo_tril.pb --framework XLA --target trn1 --model-type transformer --auto-cast none --output /tmp/tril.neff
2023-06-16T23:23:42Z WARNING 111032 [LayoutBottleneck]: Connected component _compare.6 has no matmult/reduce/batchnorm. Guessing layout. Considering putting on CPU.
Selecting 4 allocations
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Analyzing dependencies of Block1
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Dependency reduction of sg0000
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Huge thanks for the bug fix!
I am attempting to implement the triangle lower function in XLA HLO. This is used for creating the self attention mask, for example see here.
Attached is the xla hlo .pb generated by my code: rust_hlo_tril.pb.zip
Here is a debug_ir representation of it:
And here is what it looks like in my code:
When I load this .pb and run it on CPU via jax, it succeeds:
Successful output:
However, when I compile and run that same .pb for Trainium, neuronx-cc gives me a compilation internal error:
I have many other .pb files successfully running on Trainium (softmax, gelu, rmsnorm, etc.), this is the first one I've encountered that causes an issue with the neuron compiler. I find that the compiler crashes even when
tril
does not depend on an f32 input tensor, but rather is something liketril(tensor.ones(x.shape))
, an example XLA graph of that here: rust_hlo_tril.pb.zipPlease let me know if I am trying something unsupported here and how I ought to work around it. Thanks!