PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k stars 7.32k forks source link

使用ch_PP-OCRv4_rec训练数据集报错:Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB. #11989

Open lili-changjiang opened 3 weeks ago

lili-changjiang commented 3 weeks ago

Out of memory error on GPU 0. Cannot allocate 129.394531MB memory on GPU 0, 23.611938GB memory has been allocated and available memory is only 31.687500MB.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model. If the above ways do not solve the out of memory problem, you can try to use CUDA managed memory. The command is export FLAGS_use_cuda_managed_memory=false. (at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:95)

我设置的ch_PP-OCRv4_rec.yml:

Global: debug: false use_gpu: true epoch_num: 20 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_ppocr_v4 save_epoch_step: 3 eval_batch_step: [0, 100] cal_metric_during_train: true pretrained_model: ./pretrained_models/ch_PP-OCRv4_rec_train/student checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/ppocr_keys_v1.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.0001 warmup_epoch: 2 regularizer: name: L2 factor: 3.0e-05

Architecture: model_type: rec algorithm: SVTR_LCNet Transform: Backbone: name: PPLCNetV3 scale: 0.95 Head: name: MultiHead head_list:

Loss: name: MultiLoss loss_config_list:

PostProcess:
name: CTCLabelDecode

Metric: name: RecMetric main_indicator: acc

Train: dataset: name: MultiScaleDataSet ds_width: false data_dir: ./train_data/train ext_op_transform_idx: 1 label_file_list:

为什么我的24G显存一下就满了,一点跑不了

UserWangZz commented 3 weeks ago

运行前显卡上有没有其他任务?

lili-changjiang commented 3 weeks ago

运行前显卡上有没有其他任务?

没有其他任务,跑过很多次都是这样

UserWangZz commented 3 weeks ago

尝试一下paddle 2.5.2版本

zhengmeng commented 2 days ago

你好,请问解决了吗?我也遇到了这个问题,我有两张24G的