Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k
stars
7.32k
forks
source link
使用ch_PP-OCRv4_rec训练数据集报错:Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. #11989
Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB.
Please check whether there is any other process using GPU 0.
If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false.
(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)
完整报错/Complete Error Message:Error Message Summary:
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB.
Please check whether there is any other process using GPU 0.
export FLAGS_use_cuda_managed_memory=false
. (at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)我设置的ch_PP-OCRv4_rec.yml:
Global: debug: false use_gpu: true epoch_num: 20 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_ppocr_v4 save_epoch_step: 3 eval_batch_step: [0, 100] cal_metric_during_train: true pretrained_model: ./pretrained_models/ch_PP-OCRv4_rec_train/student checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/ppocr_keys_v1.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3.txt
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0001 warmup_epoch: 2 regularizer: name: L2 factor: 3.0e-05
Architecture: model_type: rec algorithm: SVTR_LCNet Transform: Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list:
Loss: name: MultiLoss loss_config_list:
PostProcess:
name: CTCLabelDecode
Metric: name: RecMetric main_indicator: acc
Train: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./train_data/train ext_op_transform_idx: 1 label_file_list:
RecConAug: prob: 0.5 ext_data_num: 2 image_shape: [ 48, 320, 3 ]
KeepKeys: keep_keys:
drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: ./train_data/val label_file_list:
为什么我的24G显存一下就满了,一点跑不了