Open sumaiyah opened 9 months ago
Thank you for reporting, we are trying to reproduce the issue on our end. Can you share the neuron package versions?
This is everything installed in the environment
Package Version
absl-py 1.4.0 accelerate 0.23.0 aiohttp 3.8.5 aiosignal 1.3.1 amqp 5.1.1 annotated-types 0.5.0 anyio 3.7.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 astroid 2.15.6 asttokens 2.4.0 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.1.0 Automat 22.10.0 aws-neuronx-runtime-discovery 2.9 awscli 1.29.45 Babel 2.12.1 backcall 0.2.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.2 billiard 4.1.0 bleach 6.0.0 boto3 1.28.45 botocore 1.31.45 build 1.0.3 cachetools 5.3.1 celery 5.3.4 certifi 2023.7.22 cffi 1.15.1 charset-normalizer 3.2.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 cloud-tpu-client 0.10 cloudpickle 2.2.1 cmake 3.27.4.1 colorama 0.4.4 comm 0.1.4 constantly 15.1.0 contourpy 1.1.0 cryptography 41.0.3 cssselect 1.2.0 cycler 0.11.0 dask 2023.5.0 debugpy 1.7.0 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.7 distlib 0.3.7 docutils 0.16 dparse 0.6.3 ec2-metadata 2.10.0 environment-kernels 1.2.0 exceptiongroup 1.1.3 executing 1.2.0 fastapi 0.103.1 fastjsonschema 2.18.0 filelock 3.12.3 fonttools 4.42.1 fqdn 1.5.1 frozenlist 1.4.0 fsspec 2023.9.0 google-api-core 1.34.0 google-api-python-client 1.8.0 google-auth 2.23.0 google-auth-httplib2 0.1.1 googleapis-common-protos 1.60.0 httpie 3.2.2 httplib2 0.22.0 huggingface-hub 0.17.1 hyperlink 21.0.0 idna 3.4 imageio 2.31.3 importlib-metadata 6.8.0 importlib-resources 6.0.1 incremental 22.10.0 iniconfig 2.0.0 ipykernel 6.25.2 ipython 8.12.2 ipython-genutils 0.2.0 ipywidgets 8.1.0 islpy 2023.1 isoduration 20.11.0 isort 5.12.0 itemadapter 0.8.0 itemloaders 1.1.0 jedi 0.19.0 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.2 json5 0.9.14 jsonpointer 2.4 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 jupyter 1.0.0 jupyter_client 8.3.1 jupyter-console 6.6.3 jupyter_core 5.3.1 jupyter-events 0.7.0 jupyter-lsp 2.2.0 jupyter_server 2.7.3 jupyter_server_terminals 0.4.4 jupyterlab 4.0.5 jupyterlab-pygments 0.2.2 jupyterlab_server 2.24.0 jupyterlab-widgets 3.0.8 kiwisolver 1.4.5 kombu 5.3.2 lazy-object-proxy 1.9.0 libneuronxla 0.5.476 llvmlite 0.40.1 locket 1.0.0 lockfile 0.12.2 lxml 4.9.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.3 matplotlib-inline 0.1.6 mccabe 0.7.0 mdurl 0.1.2 mistune 3.0.1 multidict 6.0.4 nbclient 0.8.0 nbconvert 7.8.0 nbformat 5.9.2 nest-asyncio 1.5.7 networkx 2.6.3 neuronx-cc 2.10.0.34+6c8792c6f neuronx-hwm 2.10.0.5+7b1976adf notebook 7.0.3 notebook_shim 0.2.3 numba 0.57.1 numpy 1.21.6 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 oauth2client 4.1.3 opencv-python 4.8.0.76 overrides 7.4.0 packaging 21.3 pandas 2.0.3 pandocfilters 1.5.0 parsel 1.8.1 parso 0.8.3 partd 1.4.0 pexpect 4.8.0 pgzip 0.3.5 pickleshare 0.7.5 Pillow 10.0.0 pip 23.2.1 pip-tools 7.3.0 pipenv 2023.2.4 pkg_resources 0.0.0 pkgutil_resolve_name 1.3.10 platformdirs 3.10.0 plotly 5.16.1 pluggy 1.3.0 prometheus-client 0.17.1 prompt-toolkit 3.0.39 Protego 0.3.0 protobuf 3.20.3 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycparser 2.21 pydantic 2.3.0 pydantic_core 2.6.3 PyDispatcher 2.0.7 Pygments 2.16.1 pylint 2.17.5 pyOpenSSL 23.2.0 pyparsing 3.1.1 pyproject_hooks 1.0.0 PySocks 1.7.1 pytest 7.4.2 python-daemon 3.0.1 python-dateutil 2.8.2 python-json-logger 2.0.7 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.1 qtconsole 5.4.4 QtPy 2.4.0 queuelib 1.6.2 referencing 0.30.2 regex 2023.8.8 requests 2.31.0 requests-file 1.5.1 requests-toolbelt 1.0.0 requests-unixsocket 0.3.0 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.5.2 rpds-py 0.10.2 rsa 4.7.2 ruamel.yaml 0.17.32 ruamel.yaml.clib 0.2.7 s3transfer 0.6.2 safetensors 0.3.3 scikit-learn 1.3.0 scipy 1.7.3 Scrapy 2.10.1 seaborn 0.12.2 Send2Trash 1.8.2 sentencepiece 0.1.99 service-identity 23.1.0 setuptools 68.2.1 shap 0.42.1 six 1.16.0 slicer 0.0.7 sniffio 1.3.0 soupsieve 2.5 stack-data 0.6.2 starlette 0.27.0 tenacity 8.2.3 terminado 0.17.1 threadpoolctl 3.2.0 tinycss2 1.2.1 tldextract 3.5.0 tokenizers 0.13.3 tomli 2.0.1 tomlkit 0.12.1 toolz 0.12.0 torch 1.13.1 torch-neuronx 1.13.1.1.11.0 torch-xla 1.13.1+torchneuronb torchvision 0.14.1 tornado 6.3.3 tqdm 4.66.1 traitlets 5.9.0 transformers 4.33.1 transformers-neuronx 0.7.84 Twisted 22.10.0 typing_extensions 4.7.1 tzdata 2023.3 uri-template 1.3.0 uritemplate 3.0.1 urllib3 1.26.16 vine 5.0.0 virtualenv 20.24.5 virtualenv-clone 0.5.7 w3lib 2.1.2 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.6.3 wheel 0.41.2 widgetsnbextension 4.0.8 wrapt 1.15.0 yarl 1.9.2 zipp 3.16.2 zope.interface 6.0
Hello @sumaiyah , we tried to get the quantized checkpoint from the link you sent, however, we were not successful. For such accuracy debug, we would need the checkpoint. Is it possible to share the checkpoint and the script at this email: aws-neuron-support@amazon.com . This would make the debug faster for us.
@aws-rhsoln sent
@sumaiyah how did you compile the model... any special arguments for awq?
Hi @sumaiyah - This model uses the quantization algorithm called AWQ which is currently not supported in TnX. Is it possible to use the standard LLaMa 2 7B weights for your use-case?
I am following the steps (https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) to run a Llama2 quantized model (https://huggingface.co/TheBloke/Dolphin-Llama2-7B-AWQ) on an AWS inf2 instance (Inf2 8x large)
I can run the code however when I try to generate a sequence I get a nonsense output stream