Closed liechtym closed 9 months ago
Hi @liechtym
Thanks for reporting the problem. We've reproduced the problem and have a fix in an upcoming release. We'll respond here and close this issue once the release is out
@mrnikwaws Thank you very much! I appreciate the quick response and look forward to the release.
2.16 is now released and should address your issue. Please respond on this ticket if the issue is not resolved. If we don't hear back we'll close the issue.
Thank you much!
@mrnikwaws I just tried with the following demo code and I'm still getting the same error.
I verified my installation from the latest commit in the repo with pip freeze
:
transformers-neuronx @ git+https://github.com/aws-neuron/transformers-neuronx.git@426629648481095dfbb4f6bd993f25b88a87b505
I only changed a couple things from the demo. Instead of using 'llama-2-13b' I used 'meta-llama/Llama-2-7b-chat-hf' in LlamaForCausalLM.from_pretrained(). The only other change was tp_degree=2
in LlamaForSampling.from_pretrained().
Traceback:
Traceback (most recent call last):
File "run.py", line 11, in <module>
neuron_model = LlamaForSampling.from_pretrained('./Llama-2-7b-chat-hf-split', batch_size=1, tp_degree=2, amp='f16')
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/module.py", line 148, in from_pretrained
state_dict = torch.load(state_dict_path)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch/serialization.py", line 252, in __init__
super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './Llama-2-7b-chat-hf-split/pytorch_model.bin'
Again, I'm on the same instance, AMI, and setup as before.
@liechtym Sorry for the inconvenience. We have a fix for this in transformers-neuronx github repo which has been updated today. Can you please check with the latest?
@shebbur-aws Yes I'll check with the latest and update you soon.
@shebbur-aws This issue seems to be resolved when reinstalling from the Github repo.
However, I am now getting the following error while running meta-llama-2-13b-sampling.ipynb with the modifications I described in the previous comment. Let me know if you'd like me to create a new issue for this.
2024-01-04 14:33:59.000295: 4197 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000383: 4198 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000471: 4199 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000492: 4197 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_9e281341e7845ee2287f+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000563: 4200 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000601: 4198 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_a4faa198082ac5b8d787+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000623: 4201 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000703: 4202 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000754: 4203 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000755: 4199 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d5006487226e226573ea+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:33:59.000756: 4204 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000790: 4205 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:33:59.000862: 4206 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-01-04 14:34:00.000087: 4200 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1bf56f238691e0fd88c8+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440: 4202 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_70d1a1ce4d52a869b9e6+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000440: 4201 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_c46e110ea38cea049c6d+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464: 4203 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_b9a15c837cee1bf59e24+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000464: 4204 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_1f6eaa498df4dc58af20+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465: 4205 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_d750f56f8d6a41f0372e+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-01-04 14:34:00.000465: 4206 INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.12.54.0+f631c2365/MODULE_e22db4da23e4fde86dd1+2c2d707e/model.neff. Exiting with a successfully compiled graph.
2024-Jan-04 14:34:00.727597 4120:4181 ERROR NEFF:neff_parse NEFF version: 2.0, features: 0x100 are not supported. Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727647 4120:4181 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
2024-Jan-04 14:34:00.727686 4120:4182 ERROR NEFF:neff_parse NEFF version: 2.0, features: 0x100 are not supported. Currently supporting: 0x80000000000000ff
2024-Jan-04 14:34:00.727716 4120:4182 ERROR NMGR:kmgr_load_nn_post_metrics Failed to load NN: /tmp/neuroncc_compile_workdir/63403e3c-2309-43cd-8e3d-89f3abb77371/model.MODULE_9e281341e7845ee2287f+2c2d707e.neff, err: 10
Traceback (most recent call last):
File "run.py", line 12, in <module>
neuron_model.to_neuron()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 72, in to_neuron
self.setup()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/base.py", line 63, in setup
nbs.setup()
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 335, in setup
self.program.setup(self.layers, self.pre_layer_parameters, self.ln_lm_head_params)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1449, in setup
super().setup(layers, pre_layer_params, ln_lm_head_params)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/decoder.py", line 1325, in setup
kernel.load(io_ring_cache_size)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/compiler.py", line 454, in load
self.model.load()
RuntimeError: nrt_load_collectives status=10
@liechtym Looks like there is a mismatch in compiler and runtime/tools version you are using. Can you please upgrade your runtime packages to 2.16 version as well which should fix this issue you are seeing.
Thanks @shebbur-aws. I will try this out and report back soon.
It's working great! Thanks! If I have any additional issues I'll file a different issue. Thanks again.
I've been following the https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb example. However I came across with an issue using a modified version of LLama made for MiniGPT4.
I'm running on a Inf2.8xlarge with "AMI Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20231205".
I updated to the latest Neuron version via python -m pip install --upgrade neuronx-cc==2. --pre torch-neuronx==2.0. torchvision
Here's my code to compile. This finishes properly.
I then attempt to run it with the following code:
And I get this error:
These are the files in ./MiniGPT-4-LLaMA-7b-split:
Any help or direction would be stellar! Thanks.