Update TensorRT-LLM - Githubissues

kaiyux commented 3 months ago

Model Support
- Support Phi-3-medium models, see examples/phi/README.md
Features
- Added support for quantized base model and FP16/BF16 LoRA.
API
- [BREAKING CHANGE] max_batch_size in trtllm-build command is 256 by default now.
- [BREAKING CHANGE] max_num_tokens in trtllm-build command is 8192 by default now.
- [BREAKING CHANGE] api in gptManagerBenchmark command is executor by default now.
- [BREAKING CHANGE] Added a bias argument to the LayerNorm module, and supports non-bias layer normalization.
- [BREAKING CHANGE] Refactored LLM.generate() API.
- Removed SamplingConfig
- Added SamplingParams with some sampling parameters, see tensorrt_llm/hlapi/utils.py
- Use SamplingParams instead of SamplingConfigin LLM.generate() API, see examples/high-level-api/README.md
- [BREAKING CHANGE]: Refactored GptManager API
- Move maxBeamWidth into TrtGptModelOptionalParams
- Move schedulerConfig into TrtGptModelOptionalParams
Bug fixes
- Fixed convert_hf_mpt_legacy call failure when the function is called in other than global scope, thanks to the contribution from @bloodeagle40234 in #1534.
- Fixed use_fp8_context_fmha broken outputs (#1539).
- Fixed pre-norm weight conversion for NMT models, thanks to the contribution from @Pzzzzz5142 in #1723.
- Fixed random seed initialization issue, thanks to the contribution from @pathorn in #1742.
- Fixed stop words and bad words in python bindings. (#1642)
Performance
- Low latency optimization
- Added a reduce-norm feature which aims to fuse the ResidualAdd and LayerNorm kernels after AllReduce into a single kernel, which is recommended to be enabled when the batch size is small and the generation phase time is dominant.
- Added FP8 support to the GEMM plugin, which benefits the cases when batch size is smaller than 4.
Documentation
- Added --ipc=host notes to installation guide to prevent bus error, see docs/source/installation/build-from-source-linux.md and docs/source/installation/linux.md (#1538)

pfk-beta commented 3 months ago

Hi, thanks for your hard work, btw. I have spotted huge removal in examples/run.py: https://github.com/NVIDIA/TensorRT-LLM/commit/db4edea1e1359bcfcac7bbb87c1b639b5611c721#diff-299cb0140ad8f9d286c86ecc32b793b048531e27570675b94e54b57b66b3d7d5. Is it intented?

pfk-beta commented 3 months ago

Sorry for false alarm, these arguments was moved to utils. I didn't spotted it

NVIDIA / TensorRT-LLM

Update TensorRT-LLM #1763