dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
744 stars 44 forks source link

killing process when trying to train long video #35

Closed Deaddawn closed 11 months ago

Deaddawn commented 11 months ago

This is what I got trying finetuning long video using stage_3_full_v7b_224_longvid.sh with two v100(32G), CPU(125G)

ERRROR INFO:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Freezing all qformer weights... Freezing all qformer weights... Loading pretrained weights... Loading pretrained weights... Loading vlm_att_query weights... Loading vlm_att_ln weights... Loading vlm_att_query weights... Loading vlm_att_ln weights... Formatting inputs...Skip in lazy mode Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py310_cu117/cpu_adam... Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda-11.7/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/miniconda/envs/llamavid/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.7/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/TH -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /root/miniconda/envs/llamavid/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -c /root/miniconda/envs/llamavid/lib/python3.10/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o [2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/root/miniconda/envs/llamavid/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.7/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/TH -isystem /root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda-11.7/include -isystem /root/miniconda/envs/llamavid/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda-11.7/lib64 -lcudart -lcublas -g -march=native -fopenmp -DAVX512 -DENABLE_CUDA -c /root/miniconda/envs/llamavid/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o [3/3] c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-11.7/lib64 -lcudart -o cpu_adam.so Loading extension module cpu_adam... Time to load cpu_adam op: 45.298378467559814 seconds Loading extension module cpu_adam... Time to load cpu_adam op: 45.36060833930969 seconds Rank: 0 partition count [2, 2] and sizes[(3380480000, False), (4096, False)] Rank: 1 partition count [2, 2] and sizes[(3380480000, False), (4096, False)] [2023-12-29 02:26:03,598] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 202861 [2023-12-29 02:26:09,389] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 202862 [2023-12-29 02:26:09,389] [ERROR] [launch.py:321:sigkill_handler] ['/root/miniconda/envs/llamavid/bin/python3.1', '-u', 'llamavid/train/train_mem.py', '--local_rank=1', '--deepspeed', './scripts/zero2_offload.json', '--model_name_or_path', './work_dirs/llama-vid-7b-full-224-video-fps-1', '--version', 'imgsp_v1', '--data_path', '/remote-home/songzhende/LLaMA-VID-main/data/LLaMA-VID-Finetune/long_videoqa_part.json', '--image_folder', './data/LLaMA-VID-Finetune', '--video_folder', './data/LLaMA-VID-Finetune', '--vision_tower', './model_zoo/LAVIS/eva_vit_g.pth', '--image_processor', './llamavid/processor/clip-patch14-224', '--pretrain_mm_mlp_adapter', './work_dirs/llama-vid-7b-pretrain-224-video-fps-1/mm_projector.bin', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'False', '--video_fps', '1', '--video_token', '2', '--bert_type', 'qformer_pretrain_freeze_all', '--num_query', '32', '--compress_type', 'mean', '--fp16', 'True', '--bf16', 'False', '--output_dir', './work_dirs/llama-vid-7b-full-224-long-video', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '5000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'False', '--model_max_length', '65536', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '1', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = -9

Deaddawn commented 11 months ago

With one gpu script:

deepspeed --include localhost:0 llamavid/train/train_mem.py \ --deepspeed ./scripts/zero2_offload.json \ --model_name_or_path ./work_dirs/llama-vid-7b-full-224-video-fps-1 \ --version imgsp_v1 \ --data_path /remote-home/LLaMA-VID-main/data/LLaMA-VID-Finetune/long_videoqa_part.json \ --image_folder ./data/LLaMA-VID-Finetune \ --video_folder ./data/LLaMA-VID-Finetune \ --vision_tower ./model_zoo/LAVIS/eva_vit_g.pth \ --image_processor ./llamavid/processor/clip-patch14-224 \ --pretrain_mm_mlp_adapter ./work_dirs/llama-vid-7b-pretrain-224-video-fps-1/mm_projector.bin \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length False \ --video_fps 1 \ --video_token 2 \ --bert_type "qformer_pretrain_freeze_all" \ --num_query 32 \ --compress_type "mean" \ --fp16 True \ --bf16 False \ --output_dir ./work_dirs/llama-vid-7b-full-224-long-video \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 5000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 65536 \ --gradient_checkpointing True \ --dataloader_num_workers 1 \ --lazy_preprocess True \ --report_to wandb

get this error:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Freezing all qformer weights... Loading pretrained weights... Loading vlm_att_query weights... Loading vlm_att_ln weights... Formatting inputs...Skip in lazy mode Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module cpu_adam... Time to load cpu_adam op: 2.651790142059326 seconds Rank: 0 partition count [1, 1] and sizes[(6760960000, False), (8192, False)] wandb: Tracking run with wandb version 0.16.1 wandb: W&B syncing is set to offline in this directory.
wandb: Run wandb online or set WANDB_MODE=online to enable cloud syncing. 0%| | 0/8920 [00:00<?, ?it/s]Traceback (most recent call last): File "/remote-home/songzhende/LLaMA-VID-main/llamavid/train/train_mem.py", line 13, in train() File "/remote-home/songzhende/LLaMA-VID-main/llamavid/train/train.py", line 1192, in train trainer.train() File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/transformers/trainer.py", line 1787, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/accelerate/data_loader.py", line 381, in iter dataloader_iter = super().iter() File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in iter return self._get_iterator() File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/root/miniconda/envs/llamavid/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1042, in init w.start() File "/root/miniconda/envs/llamavid/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/root/miniconda/envs/llamavid/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/root/miniconda/envs/llamavid/lib/python3.10/multiprocessing/context.py", line 281, in _Popen return Popen(process_obj) File "/root/miniconda/envs/llamavid/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/root/miniconda/envs/llamavid/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory wandb: You can sync this run to the cloud by running: wandb: wandb sync /remote-home/songzhende/LLaMA-VID-main/wandb/offline-run-20231229_024232-sne6qscy wandb: Find logs at: ./wandb/offline-run-20231229_024232-sne6qscy/logs [2023-12-29 02:43:03,093] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 218712 [2023-12-29 02:43:03,094] [ERROR] [launch.py:321:sigkill_handler] ['/root/miniconda/envs/llamavid/bin/python3.1', '-u', 'llamavid/train/train_mem.py', '--local_rank=0', '--deepspeed', './scripts/zero2_offload.json', '--model_name_or_path', './work_dirs/llama-vid-7b-full-224-video-fps-1', '--version', 'imgsp_v1', '--data_path', '/remote-home/songzhende/LLaMA-VID-main/data/LLaMA-VID-Finetune/long_videoqa_part.json', '--image_folder', './data/LLaMA-VID-Finetune', '--video_folder', './data/LLaMA-VID-Finetune', '--vision_tower', './model_zoo/LAVIS/eva_vit_g.pth', '--image_processor', './llamavid/processor/clip-patch14-224', '--pretrain_mm_mlp_adapter', './work_dirs/llama-vid-7b-pretrain-224-video-fps-1/mm_projector.bin', '--mm_projector_type', 'mlp2x_gelu', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'False', '--video_fps', '1', '--video_token', '2', '--bert_type', 'qformer_pretrain_freeze_all', '--num_query', '32', '--compress_type', 'mean', '--fp16', 'True', '--bf16', 'False', '--output_dir', './work_dirs/llama-vid-7b-full-224-long-video', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '5000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'False', '--model_max_length', '65536', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '1', '--lazy_preprocess', 'True', '--report_to', 'wandb'] exits with return code = 1

Deaddawn commented 11 months ago

Work just fine without using deepspeed

yanwei-li commented 11 months ago

Hi, I guess maybe the memory (125G) is not enough for deepspeed zero2_offload. Welcome to open this issue again if need further support.

Deaddawn commented 11 months ago

Hi, I guess maybe the memory (125G) is not enough for deepspeed zero2_offload. Welcome to open this issue again if need further support.

Yes, indeed. Can you give me your experiment hardware setting for reference?