breezedeus / CnOCR

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】
https://www.breezedeus.com/article/cnocr
Apache License 2.0
3.24k stars 504 forks source link

gpu 训练失败 #271

Open HaviZou opened 1 year ago

HaviZou commented 1 year ago

cpu 可以正常训练 已安装cuda 2.0.0+cu118 gpu显示已经启用 [INFO 2023-08-02 14:31:54,016 _log_device_info:1798] GPU available: True, used: True

PS D:\CnOCR> cnocr train -m densenet_lite_136-fc --index-dir data/images/labels_connect10 --train-config-fp docs/examples/train_config_gpu_written_num.json [WARNING 2023-08-02 14:31:53,623 _showwarnmsg:109] D:\CnOCR\env\lib\site-packages\pytorch_lightning\core\datamodule.py:95: LightningDeprecationWarning: DataModule property train_transforms was deprecated in v1.5 and will be removed in v1.7. rank_zero_deprecation(

[WARNING 2023-08-02 14:31:53,624 _showwarnmsg:109] D:\CnOCR\env\lib\site-packages\pytorch_lightning\core\datamodule.py:114: LightningDeprecationWarning: DataModule property val_transforms was deprecated in v1.5 and will be removed in v1.7. rank_zero_deprecation(

[INFO 2023-08-02 14:31:54,016 _check_and_init_precision:696] Using 16bit native Automatic Mixed Precision (AMP) [INFO 2023-08-02 14:31:54,016 _log_device_info:1798] GPU available: True, used: True [INFO 2023-08-02 14:31:54,028 _log_device_info:1803] TPU available: False, using: 0 TPU cores [INFO 2023-08-02 14:31:54,028 _log_device_info:1806] IPU available: False, using: 0 IPUs [INFO 2023-08-02 14:31:54,029 _log_device_info:1809] HPU available: False, using: 0 HPUs [INFO 2023-08-02 14:31:54,029 _determine_batch_limits:2858] Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used.. [INFO 2023-08-02 14:31:54,029 _determine_batch_limits:2858] Trainer(limit_val_batches=1.0) was configured so 100% of the batches will be used.. [INFO 2023-08-02 14:31:54,047 train:160] OcrModel( (encoder): DenseNetLite( (features): Sequential( (conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu0): ReLU(inplace=True) (pool0): AvgPool2d(kernel_size=2, stride=2, padding=0) (denseblock1): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (transition1): _Transition( (norm): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv): Conv2d(96, 48, kernel_size=(1, 1), stride=(1, 1), bias=False) (pool): AvgPool2d(kernel_size=2, stride=2, padding=0) ) (denseblock2): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(48, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(80, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(80, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(112, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) ) (transition2): _Transition( (norm): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv): Conv2d(144, 72, kernel_size=(1, 1), stride=(1, 1), bias=False) (pool): AvgPool2d(kernel_size=2, stride=2, padding=0) ) (denseblock3): _DenseBlock( (denselayer1): _DenseLayer( (norm1): BatchNorm2d(72, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(72, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer2): _DenseLayer( (norm1): BatchNorm2d(104, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(104, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer3): _DenseLayer( (norm1): BatchNorm2d(136, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(136, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer4): _DenseLayer( (norm1): BatchNorm2d(168, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(168, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer5): _DenseLayer( (norm1): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(200, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) ) (denselayer6): _DenseLayer( (norm1): BatchNorm2d(232, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (conv1): Conv2d(232, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv2): Conv2d(128, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False) ) ) (norm5): BatchNorm2d(264, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (pool5): AvgPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0) ) ) (decoder): Sequential( (0): Dropout(p=0.1, inplace=False) (1): Linear(in_features=528, out_features=128, bias=True) (2): Dropout(p=0.1, inplace=False) (3): Tanh() ) (linear): Linear(in_features=128, out_features=11, bias=True) ) [INFO 2023-08-02 14:31:54,149 set_nvidia_flags:57] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] [INFO 2023-08-02 14:31:54,152 summarize:73] | Name | Type | Params

0 | model | OcrModel | 680 K

680 K Trainable params 0 Non-trainable params 680 K Total params 1.361 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last): File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\CnOCR\env\Scripts\cnocr.exe__main.py", line 7, in File "D:\CnOCR\env\lib\site-packages\click\core.py", line 1130, in call return self.main(*args, kwargs) File "D:\CnOCR\env\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "D:\CnOCR\env\lib\site-packages\click\core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "D:\CnOCR\env\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "D:\CnOCR\env\lib\site-packages\click\core.py", line 760, in invoke return callback(*args, kwargs) File "D:\CnOCR\env\lib\site-packages\cnocr\cli.py", line 165, in train trainer.fit( File "D:\CnOCR\env\lib\site-packages\cnocr\trainer.py", line 324, in fit self.pl_trainer.fit(pl_module, train_dataloader, val_dataloaders, datamodule) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 768, in fit self._call_and_handle_interrupt( File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 721, in _call_and_handle_interrupt return trainer_fn(*args, *kwargs) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 809, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1234, in _run results = self._run_stage() File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1321, in _run_stage return self._run_train() File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1343, in _run_train self._run_sanity_check() File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1411, in _run_sanity_check val_loop.run() File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\loops\base.py", line 204, in run self.advance(args, kwargs) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 154, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\loops\base.py", line 199, in run self.on_run_start(*args, **kwargs) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 87, in on_run_start self._data_fetcher = iter(data_fetcher) File "D:\CnOCR\env\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 178, in iter self.dataloader_iter = iter(self.dataloader) File "D:\CnOCR\env\lib\site-packages\torch\utils\data\dataloader.py", line 442, in iter return self._get_iterator() File "D:\CnOCR\env\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "D:\CnOCR\env\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in init w.start() File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\popen_spawn_win32.py", line 93, in init reduction.dump(process_obj, to_child) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'OcrDataModule.val_dataloader..' PS D:\CnOCR> Traceback (most recent call last): File "", line 1, in File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

IASNDHABHINADAD commented 2 months ago

请问解决了吗