PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.71k stars 2.86k forks source link

inference support llama3(wint8|4/a8w8) #8630

Closed yuanlehome closed 2 days ago

yuanlehome commented 1 week ago

PR types

New features

PR changes

Others

Description

inference support llama3(wint8|4/a8w8)

paddle-bot[bot] commented 1 week ago

Thanks for your contribution!

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 55.80%. Comparing base (65e721e) to head (4821cd6). Report is 6 commits behind head on develop.

Files Patch % Lines
...dlenlp/experimental/transformers/llama/modeling.py 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #8630 +/- ## ======================================== Coverage 55.80% 55.80% ======================================== Files 620 620 Lines 96642 96642 ======================================== Hits 53928 53928 Misses 42714 42714 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

DesmonDay commented 2 days ago

如讨论,目前llama3模型,在动态图非fuse场景下推理正常,在fuse场景下推理存在多进程问题。待后续排查。另外动转静时不可以设置src_length进行推理,以及高性能推理下无法正确eos。 @yuanlehome