baudm / parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
https://huggingface.co/spaces/baudm/PARSeq-OCR
Apache License 2.0
586 stars 129 forks source link

cannot pickle "Environment" object #19

Closed ronin2304 closed 2 years ago

ronin2304 commented 2 years ago

Hi, I just tried to train with "python train.py +experiment=parseq-tiny" I am running Windows 10 and have a GPU with 16gb, after executing above I get Error"TypeError: cannot pickle 'Environment' object" and "EOFError: Ran out of input" the trace gives me this:

Using 16bit native Automatic Mixed Precision (AMP) GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Loading train_dataloader to estimate number of stepping batches. [2022-08-10 22:16:36,495][strhub.data.dataset][INFO] - dataset root: C:\Users\prome\parseq-main\parseq-main\data\train\real [2022-08-10 22:16:36,621][strhub.data.dataset][INFO] - lmdb: ArT\train num samples: 29052 [2022-08-10 22:16:36,635][strhub.data.dataset][INFO] - lmdb: ArT\val num samples: 3228 [2022-08-10 22:16:36,856][strhub.data.dataset][INFO] - lmdb: COCOv2.0\train num samples: 59774 [2022-08-10 22:16:36,907][strhub.data.dataset][INFO] - lmdb: COCOv2.0\val num samples: 13406 [2022-08-10 22:17:40,145][strhub.data.dataset][INFO] - lmdb: de num samples: 15720957 [2022-08-10 22:18:16,909][strhub.data.dataset][INFO] - lmdb: LSVT\test num samples: 4193 [2022-08-10 22:18:17,047][strhub.data.dataset][INFO] - lmdb: LSVT\train num samples: 34160 [2022-08-10 22:18:17,065][strhub.data.dataset][INFO] - lmdb: LSVT\val num samples: 4273 [2022-08-10 22:18:17,088][strhub.data.dataset][INFO] - lmdb: MLT19\test num samples: 5676 [2022-08-10 22:18:17,274][strhub.data.dataset][INFO] - lmdb: MLT19\train num samples: 45423 [2022-08-10 22:18:17,295][strhub.data.dataset][INFO] - lmdb: MLT19\val num samples: 5677 [2022-08-10 22:18:18,860][strhub.data.dataset][INFO] - lmdb: OpenVINO\train_1 num samples: 443957 [2022-08-10 22:18:20,693][strhub.data.dataset][INFO] - lmdb: OpenVINO\train_2 num samples: 503062 [2022-08-10 22:18:22,521][strhub.data.dataset][INFO] - lmdb: OpenVINO\train_5 num samples: 496138 [2022-08-10 22:18:24,211][strhub.data.dataset][INFO] - lmdb: OpenVINO\train_f num samples: 470889 [2022-08-10 22:18:24,215][strhub.data.dataset][INFO] - lmdb: RCTW17\test num samples: 1044 [2022-08-10 22:18:24,247][strhub.data.dataset][INFO] - lmdb: RCTW17\train num samples: 8369 [2022-08-10 22:18:24,252][strhub.data.dataset][INFO] - lmdb: RCTW17\val num samples: 1046 [2022-08-10 22:18:24,262][strhub.data.dataset][INFO] - lmdb: ReCTS\test num samples: 2467 [2022-08-10 22:18:24,346][strhub.data.dataset][INFO] - lmdb: ReCTS\train num samples: 21594 [2022-08-10 22:18:24,356][strhub.data.dataset][INFO] - lmdb: ReCTS\val num samples: 2377 [2022-08-10 22:18:26,982][strhub.data.dataset][INFO] - lmdb: TextOCR\train num samples: 711345 [2022-08-10 22:18:27,380][strhub.data.dataset][INFO] - lmdb: TextOCR\val num samples: 107170 [2022-08-10 22:18:27,783][strhub.data.dataset][INFO] - lmdb: Uber\train num samples: 91977 [2022-08-10 22:18:27,916][strhub.data.dataset][INFO] - lmdb: Uber\val num samples: 36283 Sanity Checking: 0it [00:00, ?it/s][2022-08-10 22:18:29,444][strhub.data.dataset][INFO] - dataset root: C:\Users\prome\parseq-main\parseq-main\data\val [2022-08-10 22:18:31,224][strhub.data.dataset][INFO] - lmdb: de num samples: 425438 [2022-08-10 22:18:31,228][strhub.data.dataset][INFO] - lmdb: IC13 num samples: 848 [2022-08-10 22:18:31,246][strhub.data.dataset][INFO] - lmdb: IC15 num samples: 4468 [2022-08-10 22:18:31,255][strhub.data.dataset][INFO] - lmdb: IIIT5k num samples: 2000 [2022-08-10 22:18:31,258][strhub.data.dataset][INFO] - lmdb: SVT num samples: 257 Error executing job with overrides: ['+experiment=parseq-tiny'] Traceback (most recent call last): File "train.py", line 70, in main trainer.fit(model, datamodule=datamodule) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 770, in fit self._call_and_handle_interrupt( File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 723, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1236, in _run results = self._run_stage() File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1323, in _run_stage return self._run_train() File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1345, in _run_train self._run_sanity_check() File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1413, in _run_sanity_check val_loop.run() File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run self.advance(*args, *kwargs) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 155, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\loops\base.py", line 199, in run self.on_run_start(args, kwargs) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 88, in on_run_start self._data_fetcher = iter(data_fetcher) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 178, in iter self.dataloader_iter = iter(self.dataloader) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\torch\utils\data\dataloader.py", line 368, in iter return self._get_iterator() File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\Users\prome\anaconda3\envs\parseq\lib\site-packages\torch\utils\data\dataloader.py", line 927, in init w.start() File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'Environment' object

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

(parseq) C:\Users\prome\parseq-main\parseq-main>Traceback (most recent call last): File "", line 1, in File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\prome\anaconda3\envs\parseq\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input"

I have changed main yaml to this:

train is powered by Hydra.

== Configuration groups == Compose your configuration from those groups (group=option)

charset: 102_de_full, 36_lowercase, 62_mixed-case, 94_full dataset: real, synth experiment: abinet, abinet-sv, crnn, parseq, parseq-tiny, trba, trbc, tune_abinet-lm, vitstr model: abinet, crnn, parseq, trba, vitstr

== Config == Override anything in the config (foo.bar=value)

model: convert: all img_size:

What am I doing wrong?

ronin2304 commented 2 years ago

Ah I see there was a similar issue. I will try that fix.

bmusq commented 2 years ago

Ah I see there was a similar issue. I will try that fix.

I was the one to propose the fix. The bug is due to Windows. I made a new version compatible with latest commit. If you are still stuck with this please do not hesitate to ask me.

baudm commented 2 years ago

Duplicate of #6

ronin2304 commented 2 years ago

@bmusq Thank you. I see in the other comment that you managed to perform the training , after changing the code. I have the issue that the sanity check is done and then the trainign does not commence, it is just stuck at epoch null without moving the bar. image Is that anything you have had an issue with as well on windows?

bmusq commented 2 years ago

@ronin2304 Sometimes, at least on Windows, the display would stop updating. Its little bit risky but I simply press CTRL+C once and it starts displaying the progress bar again.

Honestly, this isn't a real solution. I was also able to complete a training without having to use this trick once.