aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
464 stars 154 forks source link

Compiling YOLOX model error #976

Open mvinci12 opened 2 months ago

mvinci12 commented 2 months ago

Errors:

2024-08-28 23:10:02.000380: 248385 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']: 2024-08-28T23:10:02Z [TEN404] Internal tensorizer error: TensorInitialization:Incorrect IR by <class 'neuronxcc.starfish.penguin.targets.transforms.TensorInitialization.TensorInitialization'> - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new. You may also be able to obtain more information using the 'XLA_IR_DEBUG' and 'XLA_HLO_DEBUG' environment variables.

RuntimeError: Bad StatusOr access: INTERNAL: RunNeuronCCImpl: error condition error != 0: <class 'subprocess.CalledProcessError'>: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.hlo_module.pb', '--output', '/tmp/ubuntu/neuroncc_compile_workdir/cb45b894-636b-4830-ace8-3a10faa6cd74/model.MODULE_8858416890750383945+ade7b014.neff', '--model-type=cnn-training', '--verbose=35']' returned non-zero exit status 70.

File "/home/ubuntu/aws-neuron-samples/torch-neuronx/training/neuron-adoption/yolox/core/trainer.py", line 97, in train_in_iter xm.mark_step() # Ensure TPU operations are synchronized │ └ <function mark_step at 0x7fcf8a8be170> └ <module 'torch_xla.core.xla_model' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages...

File "/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/core/xla_model.py", line 969, in mark_step torch_xla._XLAC._xla_step_marker( │ │ └ <built-in method _xla_step_marker of PyCapsule object at 0x7fcf9a0b6d00> │ └ <module '_XLAC' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/_XLAC.cpython-310-... └ <module 'torch_xla' from '/home/ubuntu/aws-neuron-samples/aws_neuron_venv_pytorch/lib/python3.10/site-packages/torch_xla/__in...