aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
462 stars 154 forks source link

Is there something wrong in torch_neuronx.trace ? #907

Open mhokchuekchuek opened 5 months ago

mhokchuekchuek commented 5 months ago

I compile YOLOv10 on inf1 and inf2.

model complication

after the errors, I comment assert param.is_leaf in this, I can compile my model to inf2

then I check pytorch v.1.1.3, it also checkis leaf params in this but everything is fine on inf1

can you explain what I did wrong when I compile in inf2?

how to compile

  1. follow this instruction to start the ec2, then activate the env source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate

  2. clone this repo then cd to directory compile in yolov10

  3. load model yolov10 weight via this command

    wget -P ./weights -q https://github.com/THU-MIG/yolov10/releases/download/v1.1/yolov10l.pt
  4. install yolov10 requirements

     pip install -r requirements-inf.txt
  5. run model compiler command

    python complier.py --checkpoint weights/yolov10l.pt --output_dir . --mode neuronx

    compiler output

    06/18/2024 04:26:16 - INFO - __main__ -   Tracing the model on CPU
    YOLOv10l summary (fused): 461 layers, 25839728 parameters, 25839712 gradients
    /opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py:844: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:489.)
    if param.grad is not None:
    06/18/2024 04:26:16 - INFO - torch_neuron -   PJRT_DEVICE not set, defaulting to NEURON
    /opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py:844: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:489.)
    if param.grad is not None:
    Traceback (most recent call last):
    File "/home/ubuntu/yolov10/compile/complier.py", line 84, in <module>
    traced_model = torch_neuronx.trace(yolo_model, preprocess_img)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 556, in trace
    neff_filename, metaneff, flattener, packer, weights = _trace(
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 614, in _trace
    ) = generate_hlo(
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 404, in generate_hlo
    ) = xla_trace(
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 114, in xla_trace
    placement.move(state, xla_device)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch_neuronx/xla_impl/placement.py", line 51, in move
    func.to(device)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
    File "/home/ubuntu/yolov10/compile/ultralytics/nn/tasks.py", line 270, in _apply
    self = super()._apply(fn)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
    File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 840, in _apply
    assert param.is_leaf
    AssertionError

    env

Package                       Version
----------------------------- -------------------
absl-py                       2.1.0
aiohttp                       3.9.5
aiosignal                     1.3.1
amqp                          5.2.0
annotated-types               0.7.0
ansicolors                    1.1.8
anyio                         4.4.0
argon2-cffi                   23.1.0
argon2-cffi-bindings          21.2.0
arrow                         1.3.0
astroid                       3.2.2
asttokens                     2.4.1
async-lru                     2.0.4
async-timeout                 4.0.3
attrs                         23.2.0
Automat                       22.10.0
aws-neuronx-runtime-discovery 2.9
awscli                        1.32.113
Babel                         2.15.0
beautifulsoup4                4.12.3
billiard                      4.2.0
bleach                        6.1.0
boto3                         1.34.113
botocore                      1.34.113
build                         1.2.1
cachetools                    5.3.3
celery                        5.4.0
certifi                       2024.2.2
cffi                          1.16.0
charset-normalizer            3.3.2
click                         8.1.7
click-didyoumean              0.3.1
click-plugins                 1.1.1
click-repl                    0.3.0
cloud-tpu-client              0.10
cloudpickle                   3.0.0
cmake                         3.29.3
colorama                      0.4.6
comm                          0.2.2
constantly                    23.10.4
contourpy                     1.2.1
cryptography                  42.0.7
cssselect                     1.2.0
cycler                        0.12.1
dask                          2024.5.1
debugpy                       1.8.1
decorator                     5.1.1
defusedxml                    0.7.1
dill                          0.3.8
distlib                       0.3.8
dnspython                     2.6.1
docutils                      0.16
dparse                        0.6.3
ec2-metadata                  2.10.0
email_validator               2.1.1
entrypoints                   0.4
environment-kernels           1.2.0
exceptiongroup                1.2.1
executing                     2.0.1
fastapi                       0.111.0
fastapi-cli                   0.0.4
fastjsonschema                2.19.1
filelock                      3.14.0
fonttools                     4.52.1
fqdn                          1.5.1
frozenlist                    1.4.1
fsspec                        2024.5.0
google-api-core               1.34.1
google-api-python-client      1.8.0
google-auth                   2.29.0
google-auth-httplib2          0.2.0
googleapis-common-protos      1.63.0
h11                           0.14.0
httpcore                      1.0.5
httpie                        3.2.2
httplib2                      0.22.0
httptools                     0.6.1
httpx                         0.27.0
huggingface-hub               0.23.3
hyperlink                     21.0.0
idna                          3.7
imageio                       2.34.1
importlib_metadata            7.1.0
incremental                   22.10.0
iniconfig                     2.0.0
ipykernel                     6.29.4
ipython                       8.24.0
ipywidgets                    8.1.2
islpy                         2023.1
isoduration                   20.11.0
isort                         5.13.2
itemadapter                   0.9.0
itemloaders                   1.2.0
jedi                          0.19.1
Jinja2                        3.1.4
jmespath                      1.0.1
joblib                        1.4.2
json5                         0.9.25
jsonpointer                   2.4
jsonschema                    4.22.0
jsonschema-specifications     2023.12.1
jupyter                       1.0.0
jupyter_client                8.6.2
jupyter-console               6.6.3
jupyter_core                  5.7.2
jupyter-events                0.10.0
jupyter-lsp                   2.2.5
jupyter_server                2.14.0
jupyter_server_terminals      0.5.3
jupyterlab                    4.2.1
jupyterlab_pygments           0.3.0
jupyterlab_server             2.27.2
jupyterlab_widgets            3.0.10
kiwisolver                    1.4.5
kombu                         5.3.7
libneuronxla                  2.0.965
llvmlite                      0.42.0
locket                        1.0.0
lockfile                      0.12.2
lxml                          5.2.2
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.9.0
matplotlib-inline             0.1.7
mccabe                        0.7.0
mdurl                         0.1.2
mistune                       3.0.2
mpmath                        1.3.0
multidict                     6.0.5
nbclient                      0.10.0
nbconvert                     7.16.4
nbformat                      5.10.4
nest-asyncio                  1.6.0
networkx                      2.6.3
neuronx-cc                    2.13.72.0+78a426937
neuronx-distributed           0.7.0
notebook                      7.2.0
notebook_shim                 0.2.4
numba                         0.59.1
numpy                         1.25.2
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.5.40
nvidia-nvtx-cu12              12.1.105
oauth2client                  4.1.3
opencv-python                 4.9.0.80
orjson                        3.10.3
overrides                     7.7.0
packaging                     21.3
pandas                        2.2.2
pandocfilters                 1.5.1
papermill                     2.6.0
parsel                        1.9.1
parso                         0.8.4
partd                         1.4.2
pexpect                       4.9.0
pgzip                         0.3.5
pillow                        10.3.0
pip                           24.0
pip-tools                     7.4.1
pipenv                        2023.12.1
platformdirs                  4.2.2
plotly                        5.22.0
pluggy                        1.5.0
prometheus_client             0.20.0
prompt-toolkit                3.0.43
Protego                       0.3.1
protobuf                      3.19.6
psutil                        5.9.8
ptyprocess                    0.7.0
pure-eval                     0.2.2
pyasn1                        0.6.0
pyasn1_modules                0.4.0
pycparser                     2.22
pydantic                      2.7.1
pydantic_core                 2.18.2
PyDispatcher                  2.0.7
Pygments                      2.18.0
pyinstrument                  4.6.2
pylint                        3.2.2
pyOpenSSL                     24.1.0
pyparsing                     3.1.2
pyproject_hooks               1.1.0
PySocks                       1.7.1
pytest                        8.2.1
python-daemon                 3.0.1
python-dateutil               2.9.0.post0
python-dotenv                 1.0.1
python-json-logger            2.0.7
python-multipart              0.0.9
pytz                          2024.1
PyYAML                        6.0.1
pyzmq                         26.0.3
qtconsole                     5.5.2
QtPy                          2.4.1
queuelib                      1.7.0
referencing                   0.35.1
requests                      2.32.2
requests-file                 2.1.0
requests-toolbelt             1.0.0
requests-unixsocket           0.3.0
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
rich                          13.7.1
rpds-py                       0.18.1
rsa                           4.7.2
ruamel.yaml                   0.18.6
ruamel.yaml.clib              0.2.8
s3transfer                    0.10.1
safety                        2.3.5
scikit-learn                  1.5.0
scipy                         1.11.2
Scrapy                        2.11.2
seaborn                       0.13.2
Send2Trash                    1.8.3
service-identity              24.1.0
setuptools                    70.0.0
shap                          0.45.1
shellingham                   1.5.4
six                           1.16.0
slicer                        0.0.8
sniffio                       1.3.1
soupsieve                     2.5
stack-data                    0.6.3
starlette                     0.37.2
sympy                         1.12
tenacity                      8.3.0
terminado                     0.18.1
threadpoolctl                 3.5.0
tinycss2                      1.3.0
tldextract                    5.1.2
tomli                         2.0.1
tomlkit                       0.12.5
toolz                         0.12.1
torch                         2.1.2
torch-neuronx                 2.1.2.2.1.0
torch-xla                     2.1.2
torchvision                   0.16.2
tornado                       6.4
tqdm                          4.66.4
traitlets                     5.14.3
triton                        2.1.0
Twisted                       24.3.0
typer                         0.12.3
types-python-dateutil         2.9.0.20240316
typing_extensions             4.12.0
tzdata                        2024.1
ujson                         5.10.0
uri-template                  1.3.0
uritemplate                   3.0.1
urllib3                       2.2.1
uvicorn                       0.29.0
uvloop                        0.19.0
vine                          5.1.0
virtualenv                    20.26.2
w3lib                         2.1.2
watchfiles                    0.22.0
wcwidth                       0.2.13
webcolors                     1.13
webencodings                  0.5.1
websocket-client              1.8.0
websockets                    12.0
wget                          3.2
wheel                         0.43.0
widgetsnbextension            4.0.10
yarl                          1.9.4
zipp                          3.19.0
zope.interface                6.4.post2
jluntamazon commented 5 months ago

Hi @mhokchuekchuek,

Would you be able to provide instructions on how to reproduce this error? Which version of the YoloV10 model code are you executing?

A minimal reproduction would allow us to debug on our end and let us diagnose which component is failing. The error that you see most likely occurs when moving parameters to the XLA device, but it is unclear from the context why this is happening.

mhokchuekchuek commented 5 months ago

@jluntamazon,

I apologize for the previous unclear description. I have attached how to compile YOLOv10 in the description.

aws-rishyraj commented 4 months ago

Hi @mhokchuekchuek,

Looking at the code, it's unnecessary to have fuse=True here since our compiler will fuse operators together optimally for our hardware. Furthermore, when fuse=True, the manipulations done to the module code results in a model that can't change it's device due to the existence of non-leaf tensors. This was the reason that torch_neuronx.trace failed in the first place.

When we set fuse=False, the model compiles and we're able to get 8-10ms latency on neuron vs 140ms on cpu. However, we've found the resulting model produces incorrect output. We are working on fixing the correctness issue and will respond as soon as we have an update.