FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

is cpu peak_mem monitoring work? #81

Open dlfrnaos19 opened 1 year ago

dlfrnaos19 commented 1 year ago

Thank you for the amazing project!

I was on checking opt-30B model with the provided code in readme.

python3 -m flexgen.flex_opt --model facebook/opt-30b --percent 0 100 100 0 100 0

and the result is

image

as I watched the progress, peak memory was 95/126GB, So I wonder is this right or, bug?

any keyword would help me, thanks!

henrywoo commented 1 year ago

this is not working as you expected. it is calculated by tensor size :-)

dlfrnaos19 commented 1 year ago

thanks for your help!