aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

File not found error #77

Closed oemd001 closed 4 weeks ago

oemd001 commented 1 month ago

Hello!

I attempted to run the jupyter notebook on an inf2.48xlarge instance and, the following error occurred below: image

I'm not sure what was the cause of such error, but this is what the neuron_artifacts generated: image

Installed Packages

absl-py==2.1.0 accelerate==0.23.0 aiofiles==23.2.1 aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work async-timeout==4.0.3 attrs==23.2.0 aws-neuronx-runtime-discovery==2.9 beautifulsoup4==4.12.3 blinker==1.8.2 boto3==1.34.115 botocore==1.34.115 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 click==8.1.7 cloud-tpu-client==0.10 coloredlogs==15.0.1 comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work dataclasses-json==0.6.6 datasets==2.19.1 debugpy @ file:///home/conda/feedstock_root/build_artifacts/debugpy_1707444420542/work decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work Deprecated==1.2.14 dill==0.3.8 dirtyjson==1.0.8 distro==1.9.0 docutils==0.21.2 duckduckgo_search==6.1.2 ec2-metadata==2.10.0 exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1704921103267/work executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1698579936712/work filelock==3.14.0 Flask==3.0.3 frozenlist==1.4.1 fsspec==2024.3.1 google-api-core==1.34.1 google-api-python-client==1.8.0 google-auth==2.29.0 google-auth-httplib2==0.2.0 googleapis-common-protos==1.63.0 greenlet==3.0.3 h11==0.14.0 h2==4.1.0 hpack==4.0.0 httpcore==1.0.5 httplib2==0.22.0 httpx==0.27.0 huggingface-hub==0.23.2 humanfriendly==10.0 Hypercorn==0.17.3 hyperframe==6.0.1 idna==3.7 importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1710971335535/work ipykernel @ file:///home/conda/feedstock_root/build_artifacts/ipykernel_1708996548741/work ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1715263367085/work islpy==2023.1 itsdangerous==2.2.0 jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work Jinja2==3.1.4 jmespath==1.0.1 joblib==1.4.2 jsonpatch==1.33 jsonpointer==2.4 jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1716472197302/work jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1710257277185/work langchain==0.2.1 langchain-community==0.2.1 langchain-core==0.2.2 langchain-text-splitters==0.2.0 langsmith==0.1.63 libneuronxla==2.0.965 llama-index==0.10.40 llama-index-agent-openai==0.2.5 llama-index-cli==0.1.12 llama-index-core==0.10.40 llama-index-embeddings-huggingface==0.2.1 llama-index-embeddings-openai==0.1.10 llama-index-indices-managed-llama-cloud==0.1.6 llama-index-legacy==0.9.48 llama-index-llms-openai==0.1.21 llama-index-multi-modal-llms-openai==0.1.6 llama-index-program-openai==0.1.6 llama-index-question-gen-openai==0.1.3 llama-index-readers-file==0.1.23 llama-index-readers-llama-parse==0.1.4 llama-parse==0.4.4 llamaindex-py-client==0.1.19 lockfile==0.12.2 MarkupSafe==2.1.5 marshmallow==3.21.2 matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work minijinja==2.0.1 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 mypy-extensions==1.0.0 nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work networkx==2.6.3 neuronx-cc==2.13.66.0+6dfecc895 neuronx-distributed==0.7.0 nltk==3.8.1 numpy==1.25.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 oauth2client==4.1.3 openai==1.30.5 optimum==1.18.1 optimum-neuron==0.0.22 orjson==3.10.3 outcome==1.3.0.post0 packaging==23.2 pandas==2.2.2 parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work pgzip==0.3.5 pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work pillow==10.3.0 platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1715777629804/work priority==2.0.0 prompt-toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1702399386289/work protobuf==3.19.6 psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1705722392846/work ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1642875951954/work pyarrow==16.1.0 pyarrow-hotfix==0.6 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.2 pydantic_core==2.18.3 Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work pyparsing==3.1.2 pypdf==4.2.0 pyreqwest_impersonate==0.4.6 PySocks==1.7.1 python-daemon==3.0.1 python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work python-dotenv==1.0.1 pytz==2024.1 PyYAML==6.0.1 pyzmq @ file:///home/conda/feedstock_root/build_artifacts/pyzmq_1715024398995/work Quart==0.19.6 regex==2024.5.15 requests==2.32.3 requests-unixsocket==0.3.0 rsa==4.9 s3transfer==0.10.1 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.11.2 selenium==4.21.0 sentence-transformers==2.7.0 sentencepiece==0.2.0 six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work sniffio==1.3.1 sortedcontainers==2.4.0 soupsieve==2.5 SQLAlchemy==2.0.30 stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work striprtf==0.0.26 sympy==1.12.1 taskgroup==0.0.0a4 tenacity==8.3.0 threadpoolctl==3.5.0 tiktoken==0.7.0 tokenizers==0.15.2 tomli==2.0.1 torch==2.1.2 torch-neuronx==2.1.2.2.1.0 torch-xla==2.1.2 torchvision==0.16.2 tornado @ file:///home/conda/feedstock_root/build_artifacts/tornado_1708363098266/work tqdm==4.66.4 traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work transformers==4.36.2 transformers-neuronx==0.10.0.21 trio==0.25.1 trio-websocket==0.11.1 triton==2.1.0 typing-inspect==0.9.0 typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1712329955671/work tzdata==2024.1 uritemplate==3.0.1 urllib3==2.2.1 wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work Werkzeug==3.0.3 wrapt==1.16.0 wsproto==1.2.0 xxhash==3.4.1 yarl==1.9.4 zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1695255097490/work

Steps to reproduce:

Running the code (exactly) on https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-3-70b-sampling.ipynb

Would love to get some support on this!

aws-taylor commented 4 weeks ago

Hello @oemd001,

We've identified an inconsistency in our sample code that we believe is causing this problem. We're working on a fix now.

oemd001 commented 4 weeks ago

Hey @aws-taylor, I was able to figure it out.

# ... existing code above from the Jupyter Notebook
del neuron_model
# neuron_model_2 = LlamaForSampling.from_pretrained(model_id, batch_size=1, tp_degree=24, amp='f16') <- that didn't work
neuron_model_2 = LlamaForSampling.from_pretrained(model_id, neuron_config=neuron_config, batch_size=1, tp_degree=24, amp='f16', n_positions=2048) # <- but this did!
neuron_model_2.load('neuron_artifacts') # Load the compiled Neuron artifacts
neuron_model_2.to_neuron() # will skip compile
with torch.inference_mode():
    start = time.time()
    generated_sequences = neuron_model_2.sample(input_ids, sequence_length=2048, top_k=50)
    elapsed = time.time() - start

print(f'generated sequences {generated_sequences} in {elapsed} seconds')

Seems like specifying the args (like how the notebook declared it above) fixed it!