IBM text-generation-inference issues

IBM / text-generation-inference

IBM development fork of https://github.com/huggingface/text-generation-inference

Apache License 2.0

52 stars 30 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update jinja2 dependency to fix vulnerability

#108 vaibhavjainwiz closed 3 weeks ago
0
fix: update/pin dependencies to get ONNX runtime working again

#107 tjohnson31415 closed 1 month ago
0
fix: fast tokenizer conversion should happen offline

#106 tjohnson31415 closed 1 month ago
0
get_max_sequence_length() warning if user MAX_SEQUENCE_LENGTH > model MAX_SEQUENCE_LENGTH

#105 fialhocoelho closed 2 months ago
0
Problem loading granite-3b in small MIG partitions

#104 ccamacho opened 2 months ago
1
Improve log messages around the max sequence length

#103 maxdebayser closed 2 months ago
0
Official container image

#102 josephrocca closed 3 months ago
2
fix: move parameter validation before fit_memory_scaling_model

#101 tjohnson31415 closed 3 months ago
0
dockerfile: use ubi9:latest as base

#100 dtrifiro closed 3 months ago
0
use fastsafetensors

#99 takeshi-yoshimura opened 3 months ago
3
Fix logic for determining the number of cache blocks

#98 tdoublep closed 3 months ago
0
:recycle: move metrics into one file

#97 joerunde closed 4 months ago
2
Free blocks in KVCacheManager upon error

#96 tdoublep closed 4 months ago
0
Is there a way to see all the supported parameters for TGIS?

#95 bdattoma opened 4 months ago
1
Set TP argument correctly when instantiating PagedKVCacheManager

#94 tdoublep closed 4 months ago
0
fix: check for tokenizer eos_token in ModelInfo response

#93 tjohnson31415 closed 4 months ago
0
deepseek-coder-33b-instruct model on tgis fails with flash attention and generates wrong output without flash attention

#92 maxdebayser closed 4 months ago
2
Generation does not terminate with EOS token for the vinai/PhoGPT-4B-Chat model

#91 tjohnson31415 closed 4 months ago
0
TGIS gRPC adapter for lm-eval

#90 maxdebayser opened 4 months ago
0
feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere

#89 tjohnson31415 closed 4 months ago
1
Fix llama gqa attention bias

#88 njhill closed 4 months ago
0
Log number of KVCacheManager blocks at init

#87 tdoublep closed 4 months ago
0
:sparkles: allow single-shard paged attention

#86 joerunde opened 4 months ago
2
added mlp and attn bias option to flash and paged llama models

#85 JRosenkranz closed 4 months ago
0
Added attn and mlp bias

#84 JRosenkranz closed 4 months ago
0
added attn and mlp bias

#83 JRosenkranz closed 4 months ago
0
Bump fms-extras version to avoid torch 2.3.0 issue

#82 tdoublep closed 4 months ago
0
Update transformers library

#81 njhill closed 5 months ago
0
Dependency updates

#80 njhill closed 5 months ago
0
Speculative decoding for `llama` and `gpt_bigcode`

#79 tdoublep closed 5 months ago
0
Speculative decoding for llama and gpt_bigcode

#78 tdoublep closed 5 months ago
1
Tracing: extract context from incoming request

#77 SilverSoldier closed 5 months ago
1
🐛 fix lm_head weight mapping

#76 prashantgupta24 closed 5 months ago
2
Calculate the length penalty in the same way as the transformers library

#75 maxdebayser closed 5 months ago
0
Speculative decoding

#74 prashantgupta24 closed 5 months ago
3
Speculative decoding fms extras

#73 prashantgupta24 closed 5 months ago
2
ci: add GIT_COMMIT_HASH build arg

#72 tjohnson31415 closed 5 months ago
0
Update deps to address vulnerability

#71 heyselbi closed 5 months ago
2
Fix the forwarding of the length penalty parameter

#70 maxdebayser closed 5 months ago
0
fix: small improvement to convert-to-safetensors

#69 tjohnson31415 closed 5 months ago
0
Update base image, rust, python deps, rust crates

#68 njhill closed 5 months ago
2
Performance Optimizations for TP-Aware GPTQ

#67 cyang49 opened 6 months ago
0
Re: Incoporate Marlin for GPTQ checkpoints into tgis_native

#66 cyang49 closed 5 months ago
1
fix: Zero division in inverse estimator functions

#65 maxdebayser closed 6 months ago
0
Zero division in inverse estimator functions.

#64 maxdebayser closed 6 months ago
0
:bug: disallow downloads for fast tokenizer conversion

#63 joerunde closed 6 months ago
0
Big upgrades

#62 joerunde closed 6 months ago
1
feat: support linear scaled rope for tgis_native llama

#61 joerunde closed 6 months ago
0
:fire: remove cuda-runtime entirely

#60 joerunde closed 6 months ago
0
🔥 Remove our exllama code because we use auto-gptq vendored kernels

#59 tjohnson31415 closed 6 months ago
0