issues
search
FMInference
/
FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k
stars
548
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Rename for compliance
#141
Ying1123
closed
4 days ago
0
LP optimization model and constants
#140
dimanzt
opened
3 weeks ago
0
Fixed 'significantly' typo
#139
sunchaesk
closed
1 month ago
0
How to split execution of prefill and decode for Flexgen?
#138
sunchaesk
opened
3 months ago
0
File "setup.py" not found.
#137
Learn2006
opened
6 months ago
3
Killed Issue with flexgen when running python script
#136
foreverpiano
opened
6 months ago
2
Add support for Llama and Qwen models
#135
marswen
opened
7 months ago
0
AttributeError: 'NoneType' object has no attribute 'stream_id'
#134
neomi-tenenbaum-huawei
opened
7 months ago
0
Error while split the model name
#133
neomi-tenenbaum-huawei
opened
7 months ago
0
Why the variable bls must be less than 20?
#132
LHQUer
opened
7 months ago
0
How do I match the results of profiling with the parameters of the cost model?
#131
xvanQ
opened
9 months ago
1
Implement RESTful API of FlexGen
#130
Fyphen1223
opened
9 months ago
0
"helm_run.py, line 303, in run_entry run_spec = run_specs[0] IndexError: list index out of range"
#129
hjk1231
opened
9 months ago
1
what is the helm version?
#128
oujieww
opened
9 months ago
1
How to use the model that has already been downloaded?
#127
AntonioZC666
opened
9 months ago
2
Please do not abandon this project!
#126
oobabooga
opened
11 months ago
3
[Feature] Intel dGPU/SYCL support
#125
abhilash1910
opened
1 year ago
0
Add support for symmetric quantization
#124
julian-q
closed
2 months ago
0
【PLS!】I want to know how to generate ray_bootstrap_config.yaml for my own cluster
#123
KylinC
opened
1 year ago
0
How can I calculate `*mm_flops*` on other GPU which is used in cost_model.py?
#122
minhopark-neubla
closed
1 year ago
1
Add cost model
#121
Ying1123
closed
1 year ago
0
【bug】? if we forget to add time mark code line in hf_ds folder
#120
oujieww
closed
1 year ago
2
question about quantization
#119
xinhaoc
opened
1 year ago
0
how to install from source
#118
SeekPoint
opened
1 year ago
4
Why is the CPU peak memory usage set to 0?
#117
KAIWEILIUCC
opened
1 year ago
0
AttributeError: 'OptLM' object has no attribute 'weight_home'
#116
pxc3113
opened
1 year ago
2
flexgen without GPU?
#115
AnatoliChe
opened
1 year ago
0
NotImplementedError on --percent 50 50 50 50 50 50
#114
SeekPoint
opened
1 year ago
0
Could flexgen be used for training?
#113
leiwen83
opened
1 year ago
0
fix torchrun inference
#112
fsx950223
opened
1 year ago
0
Allow FlexGen to use locally downloaded models
#111
Vinkle-hzt
opened
1 year ago
0
Update README with more instructions
#110
Ying1123
closed
1 year ago
0
Support for MoE models (see Switch Tranformer, NLLB)
#109
fiqas
opened
1 year ago
0
Peak gpu memory use not scale linearly with the percentage of gpu usage of weight
#108
frankxyy
opened
1 year ago
0
When will the optimizer for determining offload strategy be released?
#107
frankxyy
opened
1 year ago
0
Benchmark for 1 node with 4 GPUs
#106
QiaolingChen00
opened
1 year ago
1
MultiGPU problem
#105
robinzixuan
closed
1 year ago
4
Support for LLaMA
#104
ustcwhy
closed
1 year ago
1
interesting you can crop 65b
#103
seoeaa
opened
1 year ago
0
Update docs/paper.md
#102
shotarok
closed
1 year ago
0
Is FlexGen+GPTQ 4bit possible?
#101
BarfingLemurs
opened
1 year ago
1
Support for ChatGLM
#100
AldarisX
opened
1 year ago
0
ValueError: Invalid model name: galactica-30b
#99
vmajor
opened
1 year ago
1
Question about the num-gpu-batches and gpu-batch-size
#98
young-chao
opened
1 year ago
0
Question about allocations among different memory hierarchies
#97
aakejiang
opened
1 year ago
0
Add SkyPilot example for running benchmarks
#96
Michaelvll
opened
1 year ago
0
Data wrangle benchmark
#95
BinhangYuan
closed
1 year ago
0
Update links in the README
#94
merrymercy
closed
1 year ago
0
Update HELM benchmark
#93
Ying1123
closed
1 year ago
0
Questions about the intermediate tensor buffers design
#92
Dazz993
opened
1 year ago
0
Next