IBM / text-generation-inference

IBM development fork of https://github.com/huggingface/text-generation-inference
Apache License 2.0
52 stars 30 forks source link

Big upgrades #62

Closed joerunde closed 6 months ago

joerunde commented 6 months ago

Motivation

We need to update a whole bunch of things that will cause output differences, and we want to bundle them up together.

Modifications

Updates:

Result

Slight differences in outputs for some text generation prompts on many models, but our quality tests indicate no major drop in result quality.

joerunde commented 6 months ago

python package list from this image:

$ pip3 list | grep -iE '(flash|torch|auto|cuda)'
DEPRECATION: Loading egg at /opt/tgis/lib/python3.11/site-packages/custom_kernels-0.0.0-py3.11-linux-x86_64.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
auto_gptq                 0.7.1
flash-attn                2.5.6
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
torch                     2.2.1+cu121

looks like the versions I expect- running performance and integration tests to make sure nothing is totally borked