issues
search
IBM
/
text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
Apache License 2.0
52
stars
30
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update jinja2 dependency to fix vulnerability
#108
vaibhavjainwiz
closed
3 weeks ago
0
fix: update/pin dependencies to get ONNX runtime working again
#107
tjohnson31415
closed
1 month ago
0
fix: fast tokenizer conversion should happen offline
#106
tjohnson31415
closed
1 month ago
0
get_max_sequence_length() warning if user MAX_SEQUENCE_LENGTH > model MAX_SEQUENCE_LENGTH
#105
fialhocoelho
closed
2 months ago
0
Problem loading granite-3b in small MIG partitions
#104
ccamacho
opened
2 months ago
1
Improve log messages around the max sequence length
#103
maxdebayser
closed
2 months ago
0
Official container image
#102
josephrocca
closed
3 months ago
2
fix: move parameter validation before fit_memory_scaling_model
#101
tjohnson31415
closed
3 months ago
0
dockerfile: use ubi9:latest as base
#100
dtrifiro
closed
3 months ago
0
use fastsafetensors
#99
takeshi-yoshimura
opened
3 months ago
3
Fix logic for determining the number of cache blocks
#98
tdoublep
closed
3 months ago
0
:recycle: move metrics into one file
#97
joerunde
closed
4 months ago
2
Free blocks in KVCacheManager upon error
#96
tdoublep
closed
4 months ago
0
Is there a way to see all the supported parameters for TGIS?
#95
bdattoma
opened
4 months ago
1
Set TP argument correctly when instantiating PagedKVCacheManager
#94
tdoublep
closed
4 months ago
0
fix: check for tokenizer eos_token in ModelInfo response
#93
tjohnson31415
closed
4 months ago
0
deepseek-coder-33b-instruct model on tgis fails with flash attention and generates wrong output without flash attention
#92
maxdebayser
closed
4 months ago
2
Generation does not terminate with EOS token for the vinai/PhoGPT-4B-Chat model
#91
tjohnson31415
closed
4 months ago
0
TGIS gRPC adapter for lm-eval
#90
maxdebayser
opened
4 months ago
0
feat: deprecate TRANSFORMERS_CACHE, use HF_HUB_CACHE everywhere
#89
tjohnson31415
closed
4 months ago
1
Fix llama gqa attention bias
#88
njhill
closed
4 months ago
0
Log number of KVCacheManager blocks at init
#87
tdoublep
closed
4 months ago
0
:sparkles: allow single-shard paged attention
#86
joerunde
opened
4 months ago
2
added mlp and attn bias option to flash and paged llama models
#85
JRosenkranz
closed
4 months ago
0
Added attn and mlp bias
#84
JRosenkranz
closed
4 months ago
0
added attn and mlp bias
#83
JRosenkranz
closed
4 months ago
0
Bump fms-extras version to avoid torch 2.3.0 issue
#82
tdoublep
closed
4 months ago
0
Update transformers library
#81
njhill
closed
5 months ago
0
Dependency updates
#80
njhill
closed
5 months ago
0
Speculative decoding for `llama` and `gpt_bigcode`
#79
tdoublep
closed
5 months ago
0
Speculative decoding for llama and gpt_bigcode
#78
tdoublep
closed
5 months ago
1
Tracing: extract context from incoming request
#77
SilverSoldier
closed
5 months ago
1
🐛 fix lm_head weight mapping
#76
prashantgupta24
closed
5 months ago
2
Calculate the length penalty in the same way as the transformers library
#75
maxdebayser
closed
5 months ago
0
Speculative decoding
#74
prashantgupta24
closed
5 months ago
3
Speculative decoding fms extras
#73
prashantgupta24
closed
5 months ago
2
ci: add GIT_COMMIT_HASH build arg
#72
tjohnson31415
closed
5 months ago
0
Update deps to address vulnerability
#71
heyselbi
closed
5 months ago
2
Fix the forwarding of the length penalty parameter
#70
maxdebayser
closed
5 months ago
0
fix: small improvement to convert-to-safetensors
#69
tjohnson31415
closed
5 months ago
0
Update base image, rust, python deps, rust crates
#68
njhill
closed
5 months ago
2
Performance Optimizations for TP-Aware GPTQ
#67
cyang49
opened
6 months ago
0
Re: Incoporate Marlin for GPTQ checkpoints into tgis_native
#66
cyang49
closed
5 months ago
1
fix: Zero division in inverse estimator functions
#65
maxdebayser
closed
6 months ago
0
Zero division in inverse estimator functions.
#64
maxdebayser
closed
6 months ago
0
:bug: disallow downloads for fast tokenizer conversion
#63
joerunde
closed
6 months ago
0
Big upgrades
#62
joerunde
closed
6 months ago
1
feat: support linear scaled rope for tgis_native llama
#61
joerunde
closed
6 months ago
0
:fire: remove cuda-runtime entirely
#60
joerunde
closed
6 months ago
0
🔥 Remove our exllama code because we use auto-gptq vendored kernels
#59
tjohnson31415
closed
6 months ago
0
Next