aws-neuron / transformers-neuronx

Apache License 2.0
100 stars 29 forks source link

Neuron model NEFFs are dependent on the python path #91

Closed dacorvo closed 2 months ago

dacorvo commented 4 months ago

With AWS Neuron SDK 2.19, when exporting a model and saving the compiled artifacts, it is impossible to reload them afterwards if the python path is different.

This basically makes shared serialization and caching impossible, since you cannot control the deployment environment (ec2 with DLAMI, sagemaker or ad-hoc end-user endpoints will all have different environments).

Steps to reproduce

  1. download test_tnx_llama_export.py

  2. export the model in a venv

$ python3 -m venv foo_venv
$ source foo_venv/bin/activate
$ export PIP_EXTRA_INDEX_URL=https://pip.repos.neuron.amazonaws.com
$ python - m pip install -U neuronx-cc torch_neuronx==2.* transformers-neuronx
$ python test_tnx_llama_export.py export NousResearch/Hermes-2-Theta-Llama-3-8B --save_dir ./tnx-hermes-foo
  1. check the generated artifacts and verify the neuron model can be reloaded (no compilation should happen)
$ ls ./tnx-hermes-foo
2ae6fb8fd3c66e17e30f.neff  7da21af3749343c5c27f.neff
$ python test_tnx_llama_export.py run NousResearch/Hermes-2-Theta-Llama-3-8B --save_dir ./tnx-hermes-foo
  1. deactivate the venv and try to reload the model in another venv
$ deactivate
$ python3 -m venv bar_venv
$ source bar_venv/bin/activate
$ export PIP_EXTRA_INDEX_URL=https://pip.repos.neuron.amazonaws.com
$ python -m pip install -U neuronx-cc torch_neuronx==2.* transformers-neuronx
$ python test_tnx_llama_export.py run NousResearch/Hermes-2-Theta-Llama-3-8B --save_dir ./tnx-hermes-foo

You should get the following exception:

FileNotFoundError: Could not find a matching NEFF for your HLO in this directory. Ensure that the model you are trying to load is the same type and has the same parameters as the one you saved or call "save" on this model to reserialize it.
  1. export the model from the new venv and compare the results
$ python test_tnx_llama_export.py export NousResearch/Hermes-2-Theta-Llama-3-8B --save_dir ./tnx-hermes-bar
$ ls ./tnx-hermes-bar
2ae6fb8fd3c66e17e30f.neff  fa8889a59c70ee5fca60.neff

The second NEFF is always different.

dacorvo commented 4 months ago

cc @5cp @pinak-p

jeffhataws commented 4 months ago

Hi @dacorvo , we are working on a fix for this. Thanks.

jeffhataws commented 2 months ago

This was resolved in release 2.19.1

pagezyhf commented 1 month ago

@jeffhataws issue is back in 2.20: https://github.com/aws-neuron/transformers-neuronx/issues/99