Closed sunzhaoyang closed 1 year ago
直接跑原始数据和配置能跑起来吗,做了哪些改动呢
也就是改用了自己标记的数据集,改了配置文件中的数据集路径,其他的没改。
用 XFUND 数据集也报错,所以是环境问题?
W1206 21:35:40.856792 1137 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 10.2
W1206 21:35:40.974639 1137 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
[2022/12/06 21:35:49] ppocr INFO: train dataloader has 75 iters
[2022/12/06 21:35:49] ppocr INFO: valid dataloader has 7 iters
[2022/12/06 21:35:49] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 19 iterations
Aborted
嗯嗯,用xfun也报错的话,应该是环境问题,可以参考paddle官网,检查下paddle、cuda、cudnn等版本是否匹配,用check_install检查paddle是否正确安装
@MissPenguin 各种尝试,实在找不到问题...看着各个组件的版本都是对的呀....
[2022/12/08 15:54:24] ppocr INFO: train with paddle 2.4.0 and device Place(gpu:0)
[2022/12/08 15:54:24] ppocr INFO: Initialize indexs of datasets:['train_data/XFUND/zh_train/train.json']
[2022-12-08 15:54:25,492] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2022-12-08 15:54:25,895] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2022-12-08 15:54:25,895] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2022/12/08 15:54:25] ppocr INFO: Initialize indexs of datasets:['train_data/XFUND/zh_val/val.json']
[2022-12-08 15:54:25,896] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2022-12-08 15:54:26,284] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2022-12-08 15:54:26,284] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2022-12-08 15:54:26,286] [ INFO] - Already cached /root/.paddlenlp/models/vi-layoutxlm-base-uncased/model_state.pdparams
W1208 15:54:26.287709 76091 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 11.2
W1208 15:54:26.289454 76091 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
Aborted
cuda: 11.2 driver: 460.80 对应 cuda 11.2 cudnn: cudnn-11.2-linux-x64-v8.1.0.77.tgz
Running verify PaddlePaddle program ...
W1208 15:58:59.182808 76181 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 11.2
W1208 15:58:59.185256 76181 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
root@2070:/opt# nvidia-smi
Thu Dec 8 15:58:51 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80 Driver Version: 460.80 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:01:00.0 Off | N/A |
| 28% 49C P0 1W / 175W | 0MiB / 7981MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
@sunzhaoyang How to solve the issue?
I am also getting the same output when I attempt to run RE training on Nvidia T4 on Ubuntu 20.04. Is there a workaround for this issue?
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
python3 tools/train.py -c configs/kie/vi_layoutxlm/test.yml
test.yml
执行后很快出现:
如果把 gpu 改为 false
数据集是自己标记的,格式应该是对的吧。