TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
https://arxiv.org/abs/2304.08345
MIT License
262 stars 16 forks source link

TypeError: __init__() missing 2 required positional arguments: 'stdout' and 'stderr' #24

Closed caseclose closed 6 months ago

caseclose commented 9 months ago

Thank you very much for your nice work! However, I encountered the following error when executing utils/extract_frame_and_wav_multiprocess.py for processing MSRVTT. Additionally, the progress bar is not being displayed, but the generated video frames (.jpg) and audio files (.wav) do appear in the testt folder. This program has been running for 15 hours with only 2374 audio files and frame files generated. The error is as follows:

(valor) xxx:/VALOR/utils$ python extract_frame_and_wav_multiprocess.py                                                                                                                                                                           
0%|                                       | 0/10005 [00:00<?, ?it/s]
Exception in thread Thread-3:                                                                                                                                                      
Traceback (most recent call last):                                                                                                                                                   
File "/anaconda3/envs/valor/lib/python3.9/threading.py", line 973, in _bootstrap_inner                                                    
self.run()                                                                                                                                                                       
File "/anaconda3/envs/valor/lib/python3.9/threading.py", line 910, in run                                                                                          
self._target(*self._args, **self._kwargs)                                                                                                                                        
File "/anaconda3/envs/valor/lib/python3.9/multiprocessing/pool.py", line 576, in _handle_results                                                                   
task = get()                                                                                                                                                                     
File "/anaconda3/envs/valor/lib/python3.9/multiprocessing/connection.py", line 256, in recv                                                                        
return _ForkingPickler.loads(buf.getbuffer())                                                                                                                                  
TypeError: __init__() missing 2 required positional arguments: 'stdout' and 'stderr'    

It's strange that similar errors did not occur when processing the DiDeMo dataset, but they are encountered when handling the MSRVTT dataset. (Is this related to the fact that the DiDeMo dataset doesn't have audio?)

caseclose commented 9 months ago

Due to version compatibility issues with dependencies such as Python, when calling subprocess.Popen within the ffmpeg.probe module, even though we input parameters like stdout and stderr, it still throws errors indicating missing stdout, etc. Therefore, we have omitted the step of using the ffmpeg.probe command to record information. we can change the original code:

    probe = ffmpeg.probe(video_name)
    pipline(video_name, probe, output_path, fps=1, sr=22050, duration_target=10)

into the following code:

    # probe = ffmpeg.probe(video_name)
    pipline(video_name, None, output_path, fps=1, sr=22050, duration_target=10)
XuecWu commented 6 months ago

@cs-wangfeng Hi, Thank you for you useful suggestions! May I ask if you have completed the fine-tuning process on MSRVTT?

Thanks a lot.

TXH-mercury commented 6 months ago

It is abnormal and the code will not encounter bug even if there are videos which contain no audios. In addition, both msrvtt and didemo dataset contain audios. You can try adjust thread=1.