FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.2k stars 547 forks source link

LP optimization model and constants #140

Open dimanzt opened 1 month ago

dimanzt commented 1 month ago

Hi all,

I wonder how you got the following hardware constants for the Linear Programming. Is there a script that we can run to find these numbers in our system?

hardware constants

# default value aligned on google cloud T4
ctog_bdw: float = 12.89 * GB
gtoc_bdw_cache: float = 0.97 * GB
gtoc_bdw_hidden: float = 4.82 * GB

dtoc_bdw: float = 0.473 * GB
ctod_bdw_cache_p: float = 0.746 * GB
ctod_bdw_hidden_p: float = 2.015 * GB
ctod_bdw_g: float = 2.015 * GB

mm_flops_p: float = 21.24 * T
mm_flops_g: float = 4.3 * T
bmm_flops_p: float = 9.97 * T
bmm_flops_g: float = 0.079 * T
cpu_flops: float = 0.0123 * T

c1: float = 0.0168
c2: float = 0.0328
c3: float = 0.0621

I think the code in “fit_cost_model.py” is supposed to find these numbers, but the comments says “it is an old script and There is no promise of reproduction.” I couldn't run this script.

python3 fit_cost_model.py Traceback (most recent call last): File "/FlexGen/experimental/fit_cost_model.py", line 18, in from experiments.run_exp import ExpConfig, cases, get_filename ModuleNotFoundError: No module named 'experiments'

I would really appreciate if you could let me know if there is any updated script that I could use to find these constants.