facebookresearch / muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Other
358 stars 30 forks source link

Problem met when downloading German data #17

Closed yiwang454 closed 8 months ago

yiwang454 commented 9 months ago

Hi, I run the following command to download the German Dataset from MuAViC: python get_data.py --root-path ./muavic_project --src-lang de and met the error below during the stage of running segmenting (at 21% of the process "Segmenting de videos files (It takes a few hours to complete)").

  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/mnt/ceph_rbd/muavic_project/muavic/mtedx_utils.py", line 236, in preprocess_mtedx_video
    process_map(
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/std.py", line 1170, in __iter__
    for obj in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists    for element in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I'm not very familiar with using process_map. Do you have any potential assumption about the reason of this error and suggestions on solving it? Many thanks.

Anwarvic commented 9 months ago

Hi @yiwang454 ,

Sorry about this issue, could you please print out the whole error traceback? Here is an example of a complete error traceback.

yiwang454 commented 8 months ago

Thanks @Anwarvic , last time I saw your comment, I checked the example issue and realised my issue was due to the same memory limit reason. I've solved this issue now. Many thanks.