THUDM / CogVideo

Text-to-video generation. The repo for ICLR2023 paper "CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers"
Apache License 2.0
3.54k stars 378 forks source link

A segment fault was encountered during inference #42

Open boyu-chen-intern opened 5 months ago

boyu-chen-intern commented 5 months ago
(CogVideo) C:\Users\SAS\Desktop\CogVideo-main>sh scripts/inference_cogvideo_pipeline.sh
Please install apex to use fused_layer_norm, fall back to torch.nn.LayerNorm
WARNING: No training data specified
using world size: 1 and model-parallel size: 1
> initializing model parallel with size 1
DEBUG:filelock:Attempting to acquire lock 1949198065920 on C:/Users/SAS/anaconda3/Library/sharefs/cogview-new\cogvideo-stage1.zip.lock
DEBUG:filelock:Lock 1949198065920 acquired on C:/Users/SAS/anaconda3/Library/sharefs/cogview-new\cogvideo-stage1.zip.lock
DEBUG:filelock:Attempting to release lock 1949198065920 on C:/Users/SAS/anaconda3/Library/sharefs/cogview-new\cogvideo-stage1.zip.lock
DEBUG:filelock:Lock 1949198065920 released on C:/Users/SAS/anaconda3/Library/sharefs/cogview-new\cogvideo-stage1.zip.lock
building InferenceModel_Sequential model ...
scripts/inference_cogvideo_pipeline.sh: line 38:  1209 Segmentation fault      MASTER_PORT=${MASTER_PORT} SAT_HOME=/sharefs/cogview-new python cogvideo_pipeline.py --input-source interactive --output-path ./output --parallel-size 1 --both-stages --use-guidance-stage1 --guidance-alpha 3.0 --generate-frame-num 5 --tokenizer-type fake --mode inference --distributed-backend nccl --fp16 --model-parallel-size $MPSIZE --temperature $TEMP --coglm-temperature2 0.89 --top_k $TOPK --sandwich-ln --seed 1234 --num-workers 0 --batch-size 4 --max-inference-batch-size 8 $@

Hi! Dears, I'm having this issue above, and I've reinstalled icetk and it still doesn't fix the issue. I noticed that when executing the script, my CPU memory usage has been rising, and finally ran full 15.9/15.9GB, and the space occupation has also temporarily increased by more than a dozen G, may I ask how much CPU memory and space requirements are needed to run the model, is the problem of the segfault fault I encountered above because of this? Thank you!