FMInference FlexLLMGen issues

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Apache License 2.0

9.21k stars 547 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Rename for compliance

#141 Ying1123 closed 3 weeks ago
0
LP optimization model and constants

#140 dimanzt opened 1 month ago
0
Fixed 'significantly' typo

#139 sunchaesk closed 1 month ago
0
How to split execution of prefill and decode for Flexgen?

#138 sunchaesk opened 4 months ago
0
File "setup.py" not found.

#137 Learn2006 opened 6 months ago
3
Killed Issue with flexgen when running python script

#136 foreverpiano opened 7 months ago
2
Add support for Llama and Qwen models

#135 marswen opened 7 months ago
1
AttributeError: 'NoneType' object has no attribute 'stream_id'

#134 neomi-tenenbaum-huawei opened 8 months ago
0
Error while split the model name

#133 neomi-tenenbaum-huawei opened 8 months ago
0
Why the variable bls must be less than 20？

#132 LHQUer opened 8 months ago
0
How do I match the results of profiling with the parameters of the cost model?

#131 xvanQ opened 9 months ago
1
Implement RESTful API of FlexGen

#130 Fyphen1223 opened 10 months ago
0
"helm_run.py, line 303, in run_entry run_spec = run_specs[0] IndexError: list index out of range"

#129 hjk1231 opened 10 months ago
1
what is the helm version?

#128 oujieww opened 10 months ago
1
How to use the model that has already been downloaded？

#127 AntonioZC666 opened 10 months ago
2
Please do not abandon this project!

#126 oobabooga opened 12 months ago
3
[Feature] Intel dGPU/SYCL support

#125 abhilash1910 opened 1 year ago
0
Add support for symmetric quantization

#124 julian-q closed 2 months ago
0
【PLS!】I want to know how to generate ray_bootstrap_config.yaml for my own cluster

#123 KylinC opened 1 year ago
0
How can I calculate `*mm_flops*` on other GPU which is used in cost_model.py?

#122 minhopark-neubla closed 1 year ago
1
Add cost model

#121 Ying1123 closed 1 year ago
0
【bug】? if we forget to add time mark code line in hf_ds folder

#120 oujieww closed 1 year ago
2
question about quantization

#119 xinhaoc opened 1 year ago
0
how to install from source

#118 SeekPoint opened 1 year ago
4
Why is the CPU peak memory usage set to 0?

#117 KAIWEILIUCC opened 1 year ago
0
AttributeError: 'OptLM' object has no attribute 'weight_home'

#116 pxc3113 opened 1 year ago
2
flexgen without GPU?

#115 AnatoliChe opened 1 year ago
0
NotImplementedError on --percent 50 50 50 50 50 50

#114 SeekPoint opened 1 year ago
0
Could flexgen be used for training?

#113 leiwen83 opened 1 year ago
0
fix torchrun inference

#112 fsx950223 opened 1 year ago
0
Allow FlexGen to use locally downloaded models

#111 Vinkle-hzt opened 1 year ago
0
Update README with more instructions

#110 Ying1123 closed 1 year ago
0
Support for MoE models (see Switch Tranformer, NLLB)

#109 fiqas opened 1 year ago
0
Peak gpu memory use not scale linearly with the percentage of gpu usage of weight

#108 frankxyy opened 1 year ago
0
When will the optimizer for determining offload strategy be released?

#107 frankxyy opened 1 year ago
0
Benchmark for 1 node with 4 GPUs

#106 QiaolingChen00 opened 1 year ago
1
MultiGPU problem

#105 robinzixuan closed 1 year ago
4
Support for LLaMA

#104 ustcwhy closed 1 year ago
1
interesting you can crop 65b

#103 seoeaa opened 1 year ago
0
Update docs/paper.md

#102 shotarok closed 1 year ago
0
Is FlexGen+GPTQ 4bit possible?

#101 BarfingLemurs opened 1 year ago
1
Support for ChatGLM

#100 AldarisX opened 1 year ago
0
ValueError: Invalid model name: galactica-30b

#99 vmajor opened 1 year ago
1
Question about the num-gpu-batches and gpu-batch-size

#98 young-chao opened 1 year ago
0
Question about allocations among different memory hierarchies

#97 aakejiang opened 1 year ago
0
Add SkyPilot example for running benchmarks

#96 Michaelvll opened 1 year ago
0
Data wrangle benchmark

#95 BinhangYuan closed 1 year ago
0
Update links in the README

#94 merrymercy closed 1 year ago
0
Update HELM benchmark

#93 Ying1123 closed 1 year ago
0
Questions about the intermediate tensor buffers design

#92 Dazz993 opened 1 year ago
0