aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

Llama2 quantized model on Inf2 generating nonsense #41

Open sumaiyah opened 9 months ago

sumaiyah commented 9 months ago

I am following the steps (https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) to run a Llama2 quantized model (https://huggingface.co/TheBloke/Dolphin-Llama2-7B-AWQ) on an AWS inf2 instance (Inf2 8x large)

I can run the code however when I try to generate a sequence I get a nonsense output stream


>>> neuron_model.sample(tokenizer.encode("who is prime minister of uk", return_tensors="pt"), sequence_length=2048, top_k=50, streamer=TextStreamer(tokenizer))

isherтак discoveryLENGrektta mel damтакudoudoisherkl̂LENGifarola melLENG destouselikhudocherิuskkl Hauptumar discovery Malludoikh moduleджа moduleelifudouskswerLENGkl discovery discoveryaltraungsusrLENG КурcherLENGLENGToolsivelivelusrungs Haupt geldig modulesivel modulesусrola discoverydelegate Haupt discoveryugeniture moduleselif›ugen Кеede geldig discovery Schl Mallivel HöheLENG audelegatedelegateusr КеedeLENG› Кур КеdelegateLENGudo›usr Mallrellppen›delegateivel Schldelegate accessibleodgeugenumar destдоваусdelegateToolsklundesede Кур Кеkl Mallugenentityikzdelegate discoveryanzen destusrungsppenentitychioíkíkkldelegate КеLENGrellToolsommenсиingu destLENGaussedeugnougnoppenikzíkLENG Mall auLENGrellikzivelugenkldelegatedelegateftyungsichtsdelegate Кеajuси Höheewусundesaju Курусikzík ensuiteichtsewzna ensuiteAccess discoverydelegate Кеdelegateinguboldmath nucitenusr accessibleedeLENGppenikzdelegateichtsdelegateundiallotikz Ке bon Кур Ке Курrell Schldelegate Schlус Ке MallLENGodgeǧikzкурغ Кеanzenlotppenungsdelegateichtsivel moduledelegatedelegaterellundialLENGinguungsivelichtshtusrdelegatedelegatehirehtichtschiohtdelegateedeغajuingu КеungsenschaftLENGLENGajuкурdelegateсиichtsikzտ MallLENGLENGLENGLENG auichtsси КеaussغewкуркурivelLENG modulesichtsLENGungs主 Кеchkikz主ajuichtsewugenichts nucichtsкур Schldelegate bonкурlotajuusrundialdentкуркурrellغikzugenусrelllotugenLENGinguppenchiochkajuhireкурppenichtsдвиhtanzeníkGRichtsichts Schl bon Schlchkchk nucdelegateichts Schlitenitenдви moduleznaajudelegatelotchkanzenlotἱAccessdelegateLENG nucinguchkitenppenусусdelegateкурдвиусikzundialajuenschaftdelegateznaдвикурichtschio Кеewadalichtsreesichtsտchioкурichtsenschaftichtsrell bonikzlot desc Mallкуркурchioсиadalenschaftinguppenusrhireikzivel Кеikzinguppen descdelegateusrikzichtsznaichtsewchkewrellAccessewichtsichtsдвикурikzznaichtslot Schlew nucíkкур nucAccessкурichtschioдвиivel firing nuc ordchiochkhireус auskeichtsodgeadalкурungsichtsewedeikz bonусewadalchkichtsATA主enschaftewusr
jurкурусppenichtsundialajuichtsLENGenschaftedeewichtsдвиppenichts sl nucchkadalкуркурichtsdelegateikzinguLEFTLEFTдвиchkкурchk bonundialundialadalundial Schlodgechk firing bonedeichts Abbкур desc Ке Schl descundialкурznalot auichts Schlclean Кеclean Mallchkadal reciznaadalundialichts formulachio Mallchioкурclean nucусhireATAichtshire desc desc recidelegatechioichtsichtschklotichtsusrichtsungs主rell Кеchioclean sl nucкуркурichtsadalundiallotGRewсиznaewhire主курewichtsкурсиichtsristichtscleanristichts ordAccessichtschkichtsdelegateungshireundialGRristíkodgeGRungs nucкур descLEFTinguLEFTikz Schlhirerellikzungsundial nucichtsкур AbbусewchioAccessodgeATA ```
aws-rhsoln commented 9 months ago

Thank you for reporting, we are trying to reproduce the issue on our end. Can you share the neuron package versions?

sumaiyah commented 9 months ago

This is everything installed in the environment

Package Version


absl-py 1.4.0 accelerate 0.23.0 aiohttp 3.8.5 aiosignal 1.3.1 amqp 5.1.1 annotated-types 0.5.0 anyio 3.7.1 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 astroid 2.15.6 asttokens 2.4.0 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.1.0 Automat 22.10.0 aws-neuronx-runtime-discovery 2.9 awscli 1.29.45 Babel 2.12.1 backcall 0.2.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.2 billiard 4.1.0 bleach 6.0.0 boto3 1.28.45 botocore 1.31.45 build 1.0.3 cachetools 5.3.1 celery 5.3.4 certifi 2023.7.22 cffi 1.15.1 charset-normalizer 3.2.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 cloud-tpu-client 0.10 cloudpickle 2.2.1 cmake 3.27.4.1 colorama 0.4.4 comm 0.1.4 constantly 15.1.0 contourpy 1.1.0 cryptography 41.0.3 cssselect 1.2.0 cycler 0.11.0 dask 2023.5.0 debugpy 1.7.0 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.7 distlib 0.3.7 docutils 0.16 dparse 0.6.3 ec2-metadata 2.10.0 environment-kernels 1.2.0 exceptiongroup 1.1.3 executing 1.2.0 fastapi 0.103.1 fastjsonschema 2.18.0 filelock 3.12.3 fonttools 4.42.1 fqdn 1.5.1 frozenlist 1.4.0 fsspec 2023.9.0 google-api-core 1.34.0 google-api-python-client 1.8.0 google-auth 2.23.0 google-auth-httplib2 0.1.1 googleapis-common-protos 1.60.0 httpie 3.2.2 httplib2 0.22.0 huggingface-hub 0.17.1 hyperlink 21.0.0 idna 3.4 imageio 2.31.3 importlib-metadata 6.8.0 importlib-resources 6.0.1 incremental 22.10.0 iniconfig 2.0.0 ipykernel 6.25.2 ipython 8.12.2 ipython-genutils 0.2.0 ipywidgets 8.1.0 islpy 2023.1 isoduration 20.11.0 isort 5.12.0 itemadapter 0.8.0 itemloaders 1.1.0 jedi 0.19.0 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.3.2 json5 0.9.14 jsonpointer 2.4 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 jupyter 1.0.0 jupyter_client 8.3.1 jupyter-console 6.6.3 jupyter_core 5.3.1 jupyter-events 0.7.0 jupyter-lsp 2.2.0 jupyter_server 2.7.3 jupyter_server_terminals 0.4.4 jupyterlab 4.0.5 jupyterlab-pygments 0.2.2 jupyterlab_server 2.24.0 jupyterlab-widgets 3.0.8 kiwisolver 1.4.5 kombu 5.3.2 lazy-object-proxy 1.9.0 libneuronxla 0.5.476 llvmlite 0.40.1 locket 1.0.0 lockfile 0.12.2 lxml 4.9.3 markdown-it-py 3.0.0 MarkupSafe 2.1.3 matplotlib 3.7.3 matplotlib-inline 0.1.6 mccabe 0.7.0 mdurl 0.1.2 mistune 3.0.1 multidict 6.0.4 nbclient 0.8.0 nbconvert 7.8.0 nbformat 5.9.2 nest-asyncio 1.5.7 networkx 2.6.3 neuronx-cc 2.10.0.34+6c8792c6f neuronx-hwm 2.10.0.5+7b1976adf notebook 7.0.3 notebook_shim 0.2.3 numba 0.57.1 numpy 1.21.6 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 oauth2client 4.1.3 opencv-python 4.8.0.76 overrides 7.4.0 packaging 21.3 pandas 2.0.3 pandocfilters 1.5.0 parsel 1.8.1 parso 0.8.3 partd 1.4.0 pexpect 4.8.0 pgzip 0.3.5 pickleshare 0.7.5 Pillow 10.0.0 pip 23.2.1 pip-tools 7.3.0 pipenv 2023.2.4 pkg_resources 0.0.0 pkgutil_resolve_name 1.3.10 platformdirs 3.10.0 plotly 5.16.1 pluggy 1.3.0 prometheus-client 0.17.1 prompt-toolkit 3.0.39 Protego 0.3.0 protobuf 3.20.3 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycparser 2.21 pydantic 2.3.0 pydantic_core 2.6.3 PyDispatcher 2.0.7 Pygments 2.16.1 pylint 2.17.5 pyOpenSSL 23.2.0 pyparsing 3.1.1 pyproject_hooks 1.0.0 PySocks 1.7.1 pytest 7.4.2 python-daemon 3.0.1 python-dateutil 2.8.2 python-json-logger 2.0.7 pytz 2023.3.post1 PyYAML 6.0.1 pyzmq 25.1.1 qtconsole 5.4.4 QtPy 2.4.0 queuelib 1.6.2 referencing 0.30.2 regex 2023.8.8 requests 2.31.0 requests-file 1.5.1 requests-toolbelt 1.0.0 requests-unixsocket 0.3.0 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.5.2 rpds-py 0.10.2 rsa 4.7.2 ruamel.yaml 0.17.32 ruamel.yaml.clib 0.2.7 s3transfer 0.6.2 safetensors 0.3.3 scikit-learn 1.3.0 scipy 1.7.3 Scrapy 2.10.1 seaborn 0.12.2 Send2Trash 1.8.2 sentencepiece 0.1.99 service-identity 23.1.0 setuptools 68.2.1 shap 0.42.1 six 1.16.0 slicer 0.0.7 sniffio 1.3.0 soupsieve 2.5 stack-data 0.6.2 starlette 0.27.0 tenacity 8.2.3 terminado 0.17.1 threadpoolctl 3.2.0 tinycss2 1.2.1 tldextract 3.5.0 tokenizers 0.13.3 tomli 2.0.1 tomlkit 0.12.1 toolz 0.12.0 torch 1.13.1 torch-neuronx 1.13.1.1.11.0 torch-xla 1.13.1+torchneuronb torchvision 0.14.1 tornado 6.3.3 tqdm 4.66.1 traitlets 5.9.0 transformers 4.33.1 transformers-neuronx 0.7.84 Twisted 22.10.0 typing_extensions 4.7.1 tzdata 2023.3 uri-template 1.3.0 uritemplate 3.0.1 urllib3 1.26.16 vine 5.0.0 virtualenv 20.24.5 virtualenv-clone 0.5.7 w3lib 2.1.2 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.6.3 wheel 0.41.2 widgetsnbextension 4.0.8 wrapt 1.15.0 yarl 1.9.2 zipp 3.16.2 zope.interface 6.0

aws-rhsoln commented 9 months ago

Hello @sumaiyah , we tried to get the quantized checkpoint from the link you sent, however, we were not successful. For such accuracy debug, we would need the checkpoint. Is it possible to share the checkpoint and the script at this email: aws-neuron-support@amazon.com . This would make the debug faster for us.

sumaiyah commented 9 months ago

@aws-rhsoln sent

enochlev commented 4 months ago

@sumaiyah how did you compile the model... any special arguments for awq?

aws-donkrets commented 1 month ago

Hi @sumaiyah - This model uses the quantization algorithm called AWQ which is currently not supported in TnX. Is it possible to use the standard LLaMa 2 7B weights for your use-case?