facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
https://mmf.sh/
Other
5.5k stars 939 forks source link

UnpicklingError while loading Hateful Memes dataset #303

Closed jinhyun95 closed 4 years ago

jinhyun95 commented 4 years ago

❓ Questions and Help

Hi I'm trying to train the baseline models with Hateful Memes dataset, but the following error pops up.

mmf_run config=projects/hateful_memes/configs/concat_bert/defaults.yaml model=concat_bert dataset=hateful_memes

error:

Namespace(config_override=None, local_rank=None, opts=['config=projects/hateful_memes/configs/concat_bert/defaults.yaml', 'model=concat_bert', 'dataset=hateful_memes']) Overriding option config to projects/hateful_memes/configs/concat_bert/defaults.yaml Overriding option model to concat_bert Overriding option datasets to hateful_memes Distributed Init (Rank 2): tcp://localhost:11465 Distributed Init (Rank 0): tcp://localhost:11465 Distributed Init (Rank 1): tcp://localhost:11465 Initialized Host imlab-ws8 as Rank 1 Initialized Host imlab-ws8 as Rank 2 Initialized Host imlab-ws8 as Rank 0 Using seed 47521709 Logging to: ./save/logs/train_2020-06-08T17:35:47.log Downloading extras.tar.gz: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 211k/211k [00:01<00:00, 165kB/s] Traceback (most recent call last): File "/home/wookee3/anaconda3/envs/jinhyun/bin/mmf_run", line 11, in load_entry_point('mmf', 'console_scripts', 'mmf_run')() File "/home/jinhyun95/mmf/mmf_cli/run.py", line 87, in run nprocs=config.distributed.world_size, File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/home/jinhyun95/mmf/mmf_cli/run.py", line 52, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/jinhyun95/mmf/mmf_cli/run.py", line 42, in main trainer.train() File "/home/jinhyun95/mmf/mmf/trainers/base_trainer.py", line 245, in train for batch in self.train_loader: File "/home/jinhyun95/mmf/mmf/datasets/multi_dataset_loader.py", line 185, in iter return iter(self.loaders[0]) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in iter return _MultiProcessingDataLoaderIter(self) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in init w.start() File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle _thread.RLock objects

Traceback (most recent call last): File "", line 1, in File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main exitcode = _main(fd) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) _pickle.UnpicklingError: pickle data was truncated Traceback (most recent call last): File "", line 1, in File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main exitcode = _main(fd) File "/home/wookee3/anaconda3/envs/jinhyun/lib/python3.6/multiprocessing/spawn.py", line 115, in _main self = reduction.pickle.load(from_parent) _pickle.UnpicklingError: pickle data was truncated

Here are some details on my development environment. Python 3.6 (Anaconda) CUDA 10.1 The latest commit of mmf (installed from source) Ubuntu 16.04.5 LTS

Thanks in advance.

jinhyun95 commented 4 years ago

P.S. The dataset was unzipped and converted via mmf_convert_hm, and the checksum was successful.

apsdehal commented 4 years ago

@jinhyun95 Can you use Python 3.7 instead?

aabzaliev commented 4 years ago

using python 3.7 instead of 3.6 solved the issue for me. AFAIK pickle is unfriendly with different versions of python, so makes sense.

jinhyun95 commented 4 years ago

Worked for me too! Thanks a lot.