Train the model on single GPU

panzeyu2013 commented 8 months ago

I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.0 -> 101.5. I also tuned the accumulation step to 32 in order to satisfy the requirement of batch_size 64 in the paper in config setting but it seemed not helpful.

acherstyx commented 8 months ago

I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.0 -> 101.5. I also tuned the accumulation step to 32 in order to satisfy the requirement of batch_size 64 in the paper in config setting but it seemed not helpful.

Could you provide more information about your settings?

panzeyu2013 commented 8 months ago

I made no change to the default setting, down below is the yaml file from log folder.

CV_CONFIG: NUM_GOP: 8 NUM_MV: 59 NUM_RES: 59 SAMPLE: rand USE_PRE_EXTRACT: false WITH_RESIDUAL: true DATA: DATASET: MSRVTT: MAX_FRAMES: 8 MAX_WORDS: 77 METADATA: ./dataset/msrvtt/MSRVTT_data.json UNFOLD_SENTENCES: true VIDEO_READER: read_frames_compressed_domain VIDEO_ROOT: ./dataset/msrvtt/videos_h264_keyint_60 VIDEO_SIZE:

224
224 MSVD: MAX_FRAMES: 8 MAX_WORDS: 77 METADATA: ./dataset/msvd/MSVD_caption.json UNFOLD_SENTENCES: true VIDEO_READER: read_frames_compressed_domain VIDEO_ROOT: ./dataset/msvd/videos_240_h264_keyint_60 VIDEO_SIZE:
224
224 NAME: MSVDCaptioningDatasetForCLIP VATEX: MAX_FRAMES: 8 MAX_WORDS: 77 METADATA: ./dataset/vatex/VATEX_caption.json UNFOLD_SENTENCES: true VIDEO_READER: read_frames_compressed_domain VIDEO_ROOT: ./dataset/vatex/videos_240_h264_keyint_60 VIDEO_SIZE:
224
224 LOADER: BATCH_SIZE: 2 COLLATE_FN: null MULTIPROCESSING_CONTEXT: fork NUM_WORKERS: 12 PREFETCH_FACTOR: 2 SHUFFLE: true INFO: EXPERIMENT_NAME: msvd_captioning_h264 PROJECT_NAME: compressed_video LOG: DIR: log/compressed_video_msvd_captioning_h264 LOGGER_CONSOLE_COLORFUL: true LOGGER_CONSOLE_LEVEL: info LOGGER_FILE: logger.log LOSS: MultiObjectiveLoss: LOSSES: [] WEIGHT: null NAME: LabelSmoothingLoss METER: NAME: null MODEL: COCAP: ACTION_ENCODER: N_HEADS: 8 N_LAYERS: 1 MOTION_DROPOUT_PROB: 0.2 MOTION_ENCODER: N_HEADS: 8 N_LAYERS: 2 PATCH_SIZE: 8 PRETRAINED_CLIP: ViT-B/16 RESIDUAL_DROPOUT_PROB: 0.2 RESIDUAL_ENCODER: N_HEADS: 8 N_LAYERS: 2 PATCH_SIZE: 64 TASK_TYPE: captioning DDP: FIND_UNUSED_PARAMETERS: true NAME: CoCap PARALLELISM: ddp OPTIMIZER: NAME: BertAdam PARAMETER: lr: 0.0001 max_grad_norm: 1.0 schedule: warmup_constant warmup: 0.1 weight_decay: 0.01 SCHEDULER: NAME: null SYS: DETERMINISTIC: true GPU_DEVICES: -0 INIT_METHOD: tcp://localhost:2222 MULTIPROCESS: true NUM_GPU: 1 NUM_SHARDS: 1 SEED: 222 SHARD_ID: 0 TRAINER: CAPTION_TRAINER: CLIP_LR: 1.0e-06 LR_DECAY_GAMMA: 0.95 TASK_TYPE: captioning NAME: CoCapTrainer TRAINER_BASE: AMP: false AUTO_RESUME: false CLIP_NORM: null DEBUG: false EPOCH: 20 GRADIENT_ACCUMULATION_STEPS: 2 LOG_FREQ: 1 RESUME: null SAVE_FREQ: 1 TEST_ENABLE: true TRAIN_ENABLE: true WRITE_HISTOGRAM: false WRITE_PROFILER: false

My Torch version is 2.2.1 and other packages are all the latest versions. Does random sample make difference when processing the video?

acherstyx commented 8 months ago

That's strange. I will run a new experiment to verify this in the last few days.

panzeyu2013 commented 8 months ago

Thanks for your reply. I will also leave a notice if I could find the small "error" in my code or environment settings. Have a nice day!

panzeyu2013 commented 8 months ago

That's strange. I will run a new experiment to verify this in the last few days.

I may find the reason but I haven't checked yet. The videos download from official MSVD dataset webpage are avi files. The implementation in the code seems to use name+.mp4. I changed this setting because it reported not finding the file. It is also quite weird that transforming original file into h.264 encoded didn't report any error even if the postfix doesn't fit.

alzmi commented 8 months ago

That's strange. I will run a new experiment to verify this in the last few days.

I may find the reason but I haven't checked yet. The videos download from official MSVD dataset webpage are avi files. The implementation in the code seems to use name+.mp4. I changed this setting because it reported not finding the file. It is also quite weird that transforming original file into h.264 encoded didn't report any error even if the postfix doesn't fit.

Hello, I have encountered an issue while attempting to execute code and would greatly appreciate your guidance. The issue is that upon converting a video to H.264, I only observed the creation of a "video_h264_keyint_60" folder, but the folder remains empty without any content. Could you kindly assist me in understanding why this might be happening? I am eagerly anticipating your response. Thank you in advance for your help.

panzeyu2013 commented 8 months ago

That's strange. I will run a new experiment to verify this in the last few days.

I may find the reason but I haven't checked yet. The videos download from official MSVD dataset webpage are avi files. The implementation in the code seems to use name+.mp4. I changed this setting because it reported not finding the file. It is also quite weird that transforming original file into h.264 encoded didn't report any error even if the postfix doesn't fit.

Hello, I have encountered an issue while attempting to execute code and would greatly appreciate your guidance. The issue is that upon converting a video to H.264, I only observed the creation of a "video_h264_keyint_60" folder, but the folder remains empty without any content. Could you kindly assist me in understanding why this might be happening? I am eagerly anticipating your response. Thank you in advance for your help.

The README file under dataset/README.md has shown how to convert original data into H.264. Also you may need to recheck the README file on the main page and download "the compressed video reader" from https://github.com/AcherStyx/Compressed-Video-Reader.

alzmi commented 8 months ago

这很奇怪。我将在过去几天内进行一项新的实验来验证这一点。

我可能会找到原因，但我还没有检查过。从官方 MSVD 数据集网页下载的视频是 avi 文件。代码中的实现似乎使用了 name+.mp4。我更改了此设置，因为它报告找不到该文件。同样奇怪的是，即使后缀不合适，将原始文件转换为 h.264 编码也没有报告任何错误。

您好，我在尝试执行代码时遇到了问题，非常感谢您的指导。问题是，在将视频转换为H.264时，我只观察到创建了一个“video_h264_keyint_60”文件夹，但该文件夹仍然是空的，没有任何内容。您能否帮助我理解为什么会发生这种情况？我热切期待您的回应。提前感谢您的帮助。

dataset/README.md 下的 README 文件显示了如何将原始数据转换为 H.264。此外，您可能需要重新检查主页上的README文件，并从 https://github.com/AcherStyx/Compressed-Video-Reader 下载“压缩视频阅读器 https://github.com/AcherStyx/Compressed-Video-Reader”。

Thank you very much for your reply. I have installed Compressed Video Reader, but I am running "python3 tools/video_convert. py -- code=libx264-- keyint=60-- resize=240- i dataset/msrvtt/videos - o dataset/msrvtt/video_h264_keyint_60" ”The error occurred during this command: "FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/ffmpeg'" Should I still install ffmpeg

panzeyu2013 commented 8 months ago

Yes, you can try following command on Ubuntu: sudo apt-get install ffmpeg if your system is built on other linux distribution versions, please reference other document to download

alzmi commented 8 months ago

Ffmpeg has been successfully installed, but I encountered the same problem as before. That is to create the "videos240_h264_keyint_60" directory, but there is no content in the directory. Why is this? Have you ever encountered this problem?

是的，您可以在 Ubuntu 上尝试以下命令： sudo apt-get 安装 ffmpeg 如果您的系统是建立在其他 Linux 发行版之上的，请参考其他文档下载

panzeyu2013 commented 8 months ago

It seems the Python code runs correctly. You need to make sure that all the video in MSVD dataset already placed under the dataset/msvd/video . Otherwise there exist no video to convert. If the same problem occurs again, please provide more details for example your data structure

acherstyx commented 8 months ago

Hello @alzmi, You can add the --verbose option to see the error message. By default, any output of ffmpeg is omitted to avoid the progress bar being overwhelmed by a large amount of output.

alzmi commented 8 months ago

It seems the Python code runs correctly. You need to make sure that all the video in MSVD dataset already placed under the dataset/msvd/video . Otherwise there exist no video to convert. If the same problem occurs again, please provide more details for example your data structure

Hello @alzmi, You can add the --verbose option to see the error message. By default, any output of ffmpeg is omitted to avoid the progress bar being overwhelmed by a large amount of output.

I'm very sorry to take up your time. I only placed two videos in the msvd/videos directory to test if the conversion was successful. After adding "-- verbose" to the command, the printed message is "ffmpeg version 5.1.4 Copyright (c) 2000-2023 the FFmpeg developers" Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 57 28.100/57 twenty-eight point one zero zero Libavcodec 59 37.100/59 thirty-seven point one zero zero Libavformat 59 27.100/59 twenty-seven point one zero zero Libavdevice 59 7.100/59 seven point one zero zero Libavfilter 8 44.100/8 forty-four point one zero zero Libswscale 6 7.100/6 seven point one zero zero Libswresample 4 7.100/4 seven point one zero zero Unrecognized option 'x264 params' Error splitting the argument list: Option not found and FFmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 58 29.100/58 twenty-nine point one zero zero Libavcodec 60 31.102/60 thirty-one point one zero two Libavformat 60 16.100/60 sixteen point one zero zero Libavdevice 60 3.100/60 three point one zero zero Libavfilter 9 12.100/9 twelve point one zero zero Libswscale 7 5.100/7 five point one zero zero Libswresample 4 12.100/4 twelve point one zero zero Unrecognized option 'x264 params' I have attempted to split the argument list: Option not found on both ffmpeg versions 5.1.4 and 6.1.1, but both show Unrecognized option 'x264 params'. Does this mean that my ffmpeg version is not suitable? What is your ffmpeg version?

acherstyx commented 8 months ago

It seems the Python code runs correctly. You need to make sure that all the video in MSVD dataset already placed under the dataset/msvd/video . Otherwise there exist no video to convert. If the same problem occurs again, please provide more details for example your data structure

Hello @alzmi, You can add the --verbose option to see the error message. By default, any output of ffmpeg is omitted to avoid the progress bar being overwhelmed by a large amount of output.

I'm very sorry to take up your time. I only placed two videos in the msvd/videos directory to test if the conversion was successful. After adding "-- verbose" to the command, the printed message is "ffmpeg version 5.1.4 Copyright (c) 2000-2023 the FFmpeg developers" Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 57 28.100/57 twenty-eight point one zero zero Libavcodec 59 37.100/59 thirty-seven point one zero zero Libavformat 59 27.100/59 twenty-seven point one zero zero Libavdevice 59 7.100/59 seven point one zero zero Libavfilter 8 44.100/8 forty-four point one zero zero Libswscale 6 7.100/6 seven point one zero zero Libswresample 4 7.100/4 seven point one zero zero Unrecognized option 'x264 params' Error splitting the argument list: Option not found and FFmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 58 29.100/58 twenty-nine point one zero zero Libavcodec 60 31.102/60 thirty-one point one zero two Libavformat 60 16.100/60 sixteen point one zero zero Libavdevice 60 3.100/60 three point one zero zero Libavfilter 9 12.100/9 twelve point one zero zero Libswscale 7 5.100/7 five point one zero zero Libswresample 4 12.100/4 twelve point one zero zero Unrecognized option 'x264 params' I have attempted to split the argument list: Option not found on both ffmpeg versions 5.1.4 and 6.1.1, but both show Unrecognized option 'x264 params'. Does this mean that my ffmpeg version is not suitable? What is your ffmpeg version?

You should check which version is actually used by the Python script. You can specify the path with the --ffmpeg_exec option (by default it is /usr/bin/ffmpeg). FFmpeg 5.1 should work fine, as it is used by cv_reader.

panzeyu2013 commented 8 months ago

That’s wried. My version is 4.4.2, seems much older than your version. I can’t help with ffmpeg problem so you may google it.

2024年3月14日 20:12，alzmi @.***> 写道：

It seems the Python code runs correctly. You need to make sure that all the video in MSVD dataset already placed under the dataset/msvd/video . Otherwise there exist no video to convert. If the same problem occurs again, please provide more details for example your data structure

Hello @alzmi https://github.com/alzmi, You can add the --verbose option to see the error message. By default, any output of ffmpeg is omitted to avoid the progress bar being overwhelmed by a large amount of output.

I'm very sorry to take up your time. I only placed two videos in the msvd/videos directory to test if the conversion was successful. After adding "-- verbose" to the command, the printed message is "ffmpeg version 5.1.4 Copyright (c) 2000-2023 the FFmpeg developers" Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 57 28.100/57 twenty-eight point one zero zero Libavcodec 59 37.100/59 thirty-seven point one zero zero Libavformat 59 27.100/59 twenty-seven point one zero zero Libavdevice 59 7.100/59 seven point one zero zero Libavfilter 8 44.100/8 forty-four point one zero zero Libswscale 6 7.100/6 seven point one zero zero Libswresample 4 7.100/4 seven point one zero zero Unrecognized option 'x264 params' Error splitting the argument list: Option not found and FFmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers Build with gcc 9 (Ubuntu 9.4.0-1ubuntu1-20.04.1) Configuration: Libavitil 58 29.100/58 twenty-nine point one zero zero Libavcodec 60 31.102/60 thirty-one point one zero two Libavformat 60 16.100/60 sixteen point one zero zero Libavdevice 60 3.100/60 three point one zero zero Libavfilter 9 12.100/9 twelve point one zero zero Libswscale 7 5.100/7 five point one zero zero Libswresample 4 12.100/4 twelve point one zero zero Unrecognized option 'x264 params' I have attempted to split the argument list: Option not found on both ffmpeg versions 5.1.4 and 6.1.1, but both show Unrecognized option 'x264 params'. Does this mean that my ffmpeg version is not suitable? What is your ffmpeg version?

— Reply to this email directly, view it on GitHub https://github.com/acherstyx/CoCap/issues/9#issuecomment-1997304735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFLW3P74OFNNDNSHHYCIMLYYGH33AVCNFSM6AAAAABEBEY3ICVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGMYDINZTGU. You are receiving this because you authored the thread.

censhallwe commented 7 months ago

Hi, could you please tell me how you train on single GPU? I have followed the README to prepare everything, but when I try to train on the MSRVTT, I meet some problem. Loading model is ok, but the training progress is not updated. Like this:

(base) ubuntu@inst10:~/CoCap$ python3 mm_video/run_net.py --cfg configs/compressed_video/msrvtt_captioning.yaml Project: compressed_video 2024-04-05T15:09:31 => Run trainer INFO [04/05 15:09:57][Rank 0][build.py: 77]: Model total parameters: 171,010,240 2024-04-05 15:09:57.847973: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. WARNING [04/05 15:09:58][Rank 0][meter.py: 66]: Meter is not specified! Train: 1/20: 0%| | 0/65130 [00:08<?, ?it/s]

So I interrupted it, and the information is followed:

^CTraceback (most recent call last): File "mm_video/run_net.py", line 42, in main() File "mm_video/run_net.py", line 24, in main mp.spawn(run_trainer, args=(cfg,), nprocs=cfg.SYS.NUM_GPU) File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 99, in join ready = multiprocessing.connection.wait( File "/home/ubuntu/anaconda3/lib/python3.8/multiprocessing/connection.py", line 931, in wait ready = selector.select(timeout) File "/home/ubuntu/anaconda3/lib/python3.8/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) KeyboardInterrupt

I don't know why that happened, so I hope you can provide some help, please. ☺ @acherstyx @panzeyu2013

Accept-AI commented 3 months ago

I trained the model on single Nvidia RTX-4090 use the default config setting. However the result of the test dataset is significantly worse than the paper reported e.g. CIDer in msvd dataset from 113.0 -> 101.5. I also tuned the accumulation step to 32 in order to satisfy the requirement of batch_size 64 in the paper in config setting but it seemed not helpful.

您好，可以加下您的微信吗？？谢谢您，我想请教下在单卡上运行的设置问题

acherstyx / CoCap

Train the model on single GPU #9