FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Killed Issue with flexgen when running python script #136

Open foreverpiano opened 6 months ago

foreverpiano commented 6 months ago
python3 -m flexgen.flex_opt --model facebook/opt-30b --percent 0 100 100 0 100 0 --offload-dir /scratch/bcjw/ding3/flexgen_offload_dir  --path /scratch/bcjw/ding3/opt_weights
<run_flexgen>: args.model: facebook/opt-30b
model size: 55.803 GB, cache size: 2.789 GB, hidden size (prefill): 0.029 GB
init weight...
Killed

I use this script. it shows killed and has no output.

foreverpiano commented 6 months ago

run on 2xA100 node, pytorch 2.2.1 @Ying1123 @BinhangYuan

foreverpiano commented 6 months ago

it seems stuck at self.init_weight(j) where j = 68 for opt-30b