PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.36k stars 7.83k forks source link

kie预测ser+re的时候报这个错误ValueError: (InvalidArgument) x dim number should greater than 0, but received value is: 0 #13589

Closed freezehe closed 2 months ago

freezehe commented 3 months ago

Search before asking

Bug

我先是训练完了ser模型,接着也训练完了re模型,执行这个命令! python3 ./tools/infer_kie_token_ser_re.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./output/re_vi_layoutxlm_xfund_zh/best_accuracy/ Global.infer_img=./train_data/XCCIC_8020/zh_val/val.json Global.infer_mode=False -c_ser configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o_ser Architecture.Backbone.checkpoints=./output/ser_vi_layoutxlm_xfund_zh/best_accuracy/ 的时候报错: image

Environment

我是在百度飞浆 aistudio 采用gpu的方式训练自定义的数据集,环境如下:

aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
aistudio-sdk @ file:///home/aistudio/aistudio_sdk-0.2.4-py3-none-any.whl#sha256=d93411cc8764e465860cbf2f97f787dddd1548595d4776c97ddf0ea787dedd81
albucore==0.0.13
albumentations==1.4.10
altair==4.2.2
annotated-types==0.7.0
anyio==4.4.0
astor==0.8.1
asttokens==2.4.1
async-timeout==4.0.3
attrdict==2.0.1
attrdict3==2.0.2
attrs==23.2.0
Babel==2.15.0
bce-python-sdk==0.9.17
beautifulsoup4==4.12.3
blinker==1.8.2
cachetools==5.3.3
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
colorlog==6.8.2
comm==0.2.2
contourpy==1.2.1
cssselect==1.2.0
cssutils==2.11.1
cycler==0.12.1
Cython==3.0.10
datasets==2.20.0
debugpy==1.8.2
decorator==5.1.1
dill==0.3.4
dnspython==2.6.1
easydict==1.13
email_validator==2.2.0
entrypoints==0.4
et-xmlfile==1.1.0
exceptiongroup==1.2.1
executing==2.0.1
fastapi==0.111.0
fastapi-cli==0.0.4
ffmpy==0.3.2
filelock==3.15.4
fire==0.6.0
Flask==3.0.3
Flask-Babel==2.0.0
flatbuffers==24.3.25
fonttools==4.53.0
frozenlist==1.4.1
fsspec==2024.5.0
future==1.0.0
gitdb==4.0.11
GitPython==3.1.43
gradio==3.40.0
gradio_client==1.0.2
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.23.4
humanfriendly==10.0
idna==3.7
imageio==2.34.2
imgaug==0.4.0
importlib_metadata==8.0.0
importlib_resources==6.4.0
ipykernel==6.29.5
ipython==8.26.0
itsdangerous==2.2.0
jedi==0.19.1
jieba==0.42.1
Jinja2==3.1.4
joblib==1.4.2
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter_client==8.6.2
jupyter_core==5.7.2
kiwisolver==1.4.5
lanms_neo==1.0.2
lazy_loader==0.4
linkify-it-py==2.0.3
lmdb==1.5.1
lxml==5.2.2
markdown-it-py==2.2.0
MarkupSafe==2.1.5
matplotlib==3.9.1
matplotlib-inline==0.1.7
mdit-py-plugins==0.3.3
mdurl==0.1.2
more-itertools==10.3.0
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.12.2
nest-asyncio==1.6.0
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu11==11.11.3.6
nvidia-cuda-cupti-cu11==11.8.87
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cudnn-cu11==8.7.0.84
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.3.0.86
nvidia-cusolver-cu11==11.4.1.48
nvidia-cusparse-cu11==11.7.5.86
nvidia-nccl-cu11==2.19.3
nvidia-nvtx-cu11==11.8.86
onnx==1.16.1
onnxruntime==1.18.1
opencv-contrib-python==4.10.0.84
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84
openpyxl==3.1.5
opt-einsum==3.3.0
orjson==3.10.6
packaging==24.1
paddle-bfloat==0.1.7
paddle2onnx==1.2.4
paddlefsl==1.1.0
paddlehub==2.4.0
paddlenlp==2.5.2
paddleocr==2.6.1.0
paddlepaddle-gpu==2.5.1
pandas==2.2.2
parso==0.8.4
pdf2docx==0.5.8
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.4.0
platformdirs==4.2.2
Polygon3==3.0.9.1
premailer==3.10.0
prettytable==3.10.0
prompt_toolkit==3.0.47
protobuf==3.20.3
psutil==6.0.0
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==16.1.0
pyarrow-hotfix==0.6
pybind11==2.13.1
pyclipper==1.3.0.post5
pycryptodome==3.20.0
pydantic==2.8.2
pydantic_core==2.20.1
pydeck==0.9.1
pydub==0.25.1
Pygments==2.18.0
Pympler==1.1
PyMuPDF==1.19.0
pypandoc==1.13
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-docx==1.1.2
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.3
rapidfuzz==3.9.5
rarfile==4.2
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rich==13.7.1
rpds-py==0.18.1
ruff==0.5.0
safetensors==0.4.3
scikit-image==0.24.0
scikit-learn==1.5.1
scipy==1.14.0
semantic-version==2.10.0
semver==3.0.2
sentencepiece==0.2.0
seqeval==1.2.2
shapely==2.0.5
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
starlette==0.37.2
streamlit==1.13.0
streamlit-image-comparison==0.0.4
sympy==1.12.1
termcolor==2.4.0
threadpoolctl==3.5.0
tifffile==2024.7.24
toml==0.10.2
tomli==2.0.1
tomlkit==0.12.0
tool-helpers==0.1.1
toolz==0.12.1
tornado==6.4.1
tqdm==4.66.4
traitlets==5.14.3
typer==0.12.3
typing_extensions==4.12.2
tzdata==2024.1
tzlocal==5.2
uc-micro-py==1.0.3
ujson==5.10.0
urllib3==2.2.2
uvicorn==0.30.1
uvloop==0.19.0
validators==0.30.0
visualdl==2.4.2
watchdog==4.0.1
watchfiles==0.22.0
wcwidth==0.2.13
websockets==11.0.3
Werkzeug==3.0.3
xxhash==3.4.1
yacs==0.1.8
yarl==1.9.4
zipp==3.19.2

Minimal Reproducible Example

[2024-08-04 17:44:30,395] [   ERROR] check_version.py:39 - Error fetching version info
Traceback (most recent call last):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/albumentations/check_version.py", line 29, in fetch_version_info
    with opener.open(url, timeout=2) as response:
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/urllib/request.py", line 1352, in do_open
    r = h.getresponse()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/http/client.py", line 337, in begin
    self.headers = self.msg = parse_headers(self.fp)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/http/client.py", line 234, in parse_headers
    headers = _read_headers(fp)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/http/client.py", line 214, in _read_headers
    line = fp.readline(_MAXLINE + 1)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
[2024/08/04 17:44:30] ppocr INFO: ********** re config **********
[2024/08/04 17:44:30] ppocr INFO: Architecture : 
[2024/08/04 17:44:30] ppocr INFO:     Backbone : 
[2024/08/04 17:44:30] ppocr INFO:         checkpoints : ./output/re_vi_layoutxlm_xfund_zh/best_accuracy/
[2024/08/04 17:44:30] ppocr INFO:         mode : vi
[2024/08/04 17:44:30] ppocr INFO:         name : LayoutXLMForRe
[2024/08/04 17:44:30] ppocr INFO:         pretrained : True
[2024/08/04 17:44:30] ppocr INFO:     Transform : None
[2024/08/04 17:44:30] ppocr INFO:     algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:     model_type : kie
[2024/08/04 17:44:30] ppocr INFO: Eval : 
[2024/08/04 17:44:30] ppocr INFO:     dataset : 
[2024/08/04 17:44:30] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/04 17:44:30] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/04 17:44:30] ppocr INFO:         name : SimpleDataSet
[2024/08/04 17:44:30] ppocr INFO:         transforms : 
[2024/08/04 17:44:30] ppocr INFO:             DecodeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 channel_first : False
[2024/08/04 17:44:30] ppocr INFO:                 img_mode : RGB
[2024/08/04 17:44:30] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/04 17:44:30] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/04 17:44:30] ppocr INFO:                 contains_re : True
[2024/08/04 17:44:30] ppocr INFO:                 order_method : tb-yx
[2024/08/04 17:44:30] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/04 17:44:30] ppocr INFO:             VQATokenPad : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:                 return_attention_mask : True
[2024/08/04 17:44:30] ppocr INFO:             VQAReTokenRelation : None
[2024/08/04 17:44:30] ppocr INFO:             VQAReTokenChunk : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:             TensorizeEntitiesRelations : None
[2024/08/04 17:44:30] ppocr INFO:             Resize : 
[2024/08/04 17:44:30] ppocr INFO:                 size : [224, 224]
[2024/08/04 17:44:30] ppocr INFO:             NormalizeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/04 17:44:30] ppocr INFO:                 order : hwc
[2024/08/04 17:44:30] ppocr INFO:                 scale : 1
[2024/08/04 17:44:30] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/04 17:44:30] ppocr INFO:             ToCHWImage : None
[2024/08/04 17:44:30] ppocr INFO:             KeepKeys : 
[2024/08/04 17:44:30] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations']
[2024/08/04 17:44:30] ppocr INFO:     loader : 
[2024/08/04 17:44:30] ppocr INFO:         batch_size_per_card : 8
[2024/08/04 17:44:30] ppocr INFO:         drop_last : False
[2024/08/04 17:44:30] ppocr INFO:         num_workers : 8
[2024/08/04 17:44:30] ppocr INFO:         shuffle : False
[2024/08/04 17:44:30] ppocr INFO: Global : 
[2024/08/04 17:44:30] ppocr INFO:     cal_metric_during_train : False
[2024/08/04 17:44:30] ppocr INFO:     epoch_num : 20
[2024/08/04 17:44:30] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/04 17:44:30] ppocr INFO:     infer_img : ./train_data/XCCIC_8020/zh_val/val.json
[2024/08/04 17:44:30] ppocr INFO:     infer_mode : False
[2024/08/04 17:44:30] ppocr INFO:     kie_det_model_dir : None
[2024/08/04 17:44:30] ppocr INFO:     kie_rec_model_dir : None
[2024/08/04 17:44:30] ppocr INFO:     log_smooth_window : 10
[2024/08/04 17:44:30] ppocr INFO:     print_batch_step : 10
[2024/08/04 17:44:30] ppocr INFO:     save_epoch_step : 2000
[2024/08/04 17:44:30] ppocr INFO:     save_inference_dir : None
[2024/08/04 17:44:30] ppocr INFO:     save_model_dir : ./output/re_vi_layoutxlm_xfund_zh
[2024/08/04 17:44:30] ppocr INFO:     save_res_path : ./output/ccic/re/xfund_zh/with_gt
[2024/08/04 17:44:30] ppocr INFO:     seed : 2022
[2024/08/04 17:44:30] ppocr INFO:     use_gpu : True
[2024/08/04 17:44:30] ppocr INFO:     use_visualdl : False
[2024/08/04 17:44:30] ppocr INFO: Loss : 
[2024/08/04 17:44:30] ppocr INFO:     key : loss
[2024/08/04 17:44:30] ppocr INFO:     name : LossFromOutput
[2024/08/04 17:44:30] ppocr INFO:     reduction : mean
[2024/08/04 17:44:30] ppocr INFO: Metric : 
[2024/08/04 17:44:30] ppocr INFO:     main_indicator : hmean
[2024/08/04 17:44:30] ppocr INFO:     name : VQAReTokenMetric
[2024/08/04 17:44:30] ppocr INFO: Optimizer : 
[2024/08/04 17:44:30] ppocr INFO:     beta1 : 0.9
[2024/08/04 17:44:30] ppocr INFO:     beta2 : 0.999
[2024/08/04 17:44:30] ppocr INFO:     clip_norm : 10
[2024/08/04 17:44:30] ppocr INFO:     lr : 
[2024/08/04 17:44:30] ppocr INFO:         learning_rate : 1e-05
[2024/08/04 17:44:30] ppocr INFO:         warmup_epoch : 10
[2024/08/04 17:44:30] ppocr INFO:     name : AdamW
[2024/08/04 17:44:30] ppocr INFO:     regularizer : 
[2024/08/04 17:44:30] ppocr INFO:         factor : 0.0
[2024/08/04 17:44:30] ppocr INFO:         name : L2
[2024/08/04 17:44:30] ppocr INFO: PostProcess : 
[2024/08/04 17:44:30] ppocr INFO:     name : VQAReTokenLayoutLMPostProcess
[2024/08/04 17:44:30] ppocr INFO: Train : 
[2024/08/04 17:44:30] ppocr INFO:     dataset : 
[2024/08/04 17:44:30] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/04 17:44:30] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/04 17:44:30] ppocr INFO:         name : SimpleDataSet
[2024/08/04 17:44:30] ppocr INFO:         ratio_list : [1.0]
[2024/08/04 17:44:30] ppocr INFO:         transforms : 
[2024/08/04 17:44:30] ppocr INFO:             DecodeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 channel_first : False
[2024/08/04 17:44:30] ppocr INFO:                 img_mode : RGB
[2024/08/04 17:44:30] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/04 17:44:30] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/04 17:44:30] ppocr INFO:                 contains_re : True
[2024/08/04 17:44:30] ppocr INFO:                 order_method : tb-yx
[2024/08/04 17:44:30] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/04 17:44:30] ppocr INFO:             VQATokenPad : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:                 return_attention_mask : True
[2024/08/04 17:44:30] ppocr INFO:             VQAReTokenRelation : None
[2024/08/04 17:44:30] ppocr INFO:             VQAReTokenChunk : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:             TensorizeEntitiesRelations : None
[2024/08/04 17:44:30] ppocr INFO:             Resize : 
[2024/08/04 17:44:30] ppocr INFO:                 size : [224, 224]
[2024/08/04 17:44:30] ppocr INFO:             NormalizeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/04 17:44:30] ppocr INFO:                 order : hwc
[2024/08/04 17:44:30] ppocr INFO:                 scale : 1
[2024/08/04 17:44:30] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/04 17:44:30] ppocr INFO:             ToCHWImage : None
[2024/08/04 17:44:30] ppocr INFO:             KeepKeys : 
[2024/08/04 17:44:30] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'entities', 'relations']
[2024/08/04 17:44:30] ppocr INFO:     loader : 
[2024/08/04 17:44:30] ppocr INFO:         batch_size_per_card : 2
[2024/08/04 17:44:30] ppocr INFO:         drop_last : False
[2024/08/04 17:44:30] ppocr INFO:         num_workers : 4
[2024/08/04 17:44:30] ppocr INFO:         shuffle : True
[2024/08/04 17:44:30] ppocr INFO: 

[2024/08/04 17:44:30] ppocr INFO: ********** ser config **********
[2024/08/04 17:44:30] ppocr INFO: Architecture : 
[2024/08/04 17:44:30] ppocr INFO:     Backbone : 
[2024/08/04 17:44:30] ppocr INFO:         checkpoints : ./output/ser_vi_layoutxlm_xfund_zh/best_accuracy/
[2024/08/04 17:44:30] ppocr INFO:         mode : vi
[2024/08/04 17:44:30] ppocr INFO:         name : LayoutXLMForSer
[2024/08/04 17:44:30] ppocr INFO:         num_classes : 5
[2024/08/04 17:44:30] ppocr INFO:         pretrained : True
[2024/08/04 17:44:30] ppocr INFO:     Transform : None
[2024/08/04 17:44:30] ppocr INFO:     algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:     model_type : kie
[2024/08/04 17:44:30] ppocr INFO: Eval : 
[2024/08/04 17:44:30] ppocr INFO:     dataset : 
[2024/08/04 17:44:30] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/04 17:44:30] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/04 17:44:30] ppocr INFO:         name : SimpleDataSet
[2024/08/04 17:44:30] ppocr INFO:         transforms : 
[2024/08/04 17:44:30] ppocr INFO:             DecodeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 channel_first : False
[2024/08/04 17:44:30] ppocr INFO:                 img_mode : RGB
[2024/08/04 17:44:30] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/04 17:44:30] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/04 17:44:30] ppocr INFO:                 contains_re : False
[2024/08/04 17:44:30] ppocr INFO:                 order_method : tb-yx
[2024/08/04 17:44:30] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/04 17:44:30] ppocr INFO:             VQATokenPad : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:                 return_attention_mask : True
[2024/08/04 17:44:30] ppocr INFO:             VQASerTokenChunk : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:             Resize : 
[2024/08/04 17:44:30] ppocr INFO:                 size : [224, 224]
[2024/08/04 17:44:30] ppocr INFO:             NormalizeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/04 17:44:30] ppocr INFO:                 order : hwc
[2024/08/04 17:44:30] ppocr INFO:                 scale : 1
[2024/08/04 17:44:30] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/04 17:44:30] ppocr INFO:             ToCHWImage : None
[2024/08/04 17:44:30] ppocr INFO:             KeepKeys : 
[2024/08/04 17:44:30] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/04 17:44:30] ppocr INFO:     loader : 
[2024/08/04 17:44:30] ppocr INFO:         batch_size_per_card : 8
[2024/08/04 17:44:30] ppocr INFO:         drop_last : False
[2024/08/04 17:44:30] ppocr INFO:         num_workers : 4
[2024/08/04 17:44:30] ppocr INFO:         shuffle : False
[2024/08/04 17:44:30] ppocr INFO: Global : 
[2024/08/04 17:44:30] ppocr INFO:     amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/04 17:44:30] ppocr INFO:     cal_metric_during_train : False
[2024/08/04 17:44:30] ppocr INFO:     d2s_train_image_shape : [3, 224, 224]
[2024/08/04 17:44:30] ppocr INFO:     epoch_num : 50
[2024/08/04 17:44:30] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/04 17:44:30] ppocr INFO:     infer_img : train_data/XCCIC_8020/zh_val/val.json
[2024/08/04 17:44:30] ppocr INFO:     infer_mode : False
[2024/08/04 17:44:30] ppocr INFO:     kie_det_model_dir : None
[2024/08/04 17:44:30] ppocr INFO:     kie_rec_model_dir : None
[2024/08/04 17:44:30] ppocr INFO:     log_smooth_window : 10
[2024/08/04 17:44:30] ppocr INFO:     print_batch_step : 10
[2024/08/04 17:44:30] ppocr INFO:     save_epoch_step : 2000
[2024/08/04 17:44:30] ppocr INFO:     save_inference_dir : None
[2024/08/04 17:44:30] ppocr INFO:     save_model_dir : ./output/ser_vi_layoutxlm_xfund_zh
[2024/08/04 17:44:30] ppocr INFO:     save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/04 17:44:30] ppocr INFO:     seed : 2022
[2024/08/04 17:44:30] ppocr INFO:     use_gpu : True
[2024/08/04 17:44:30] ppocr INFO:     use_visualdl : False
[2024/08/04 17:44:30] ppocr INFO: Loss : 
[2024/08/04 17:44:30] ppocr INFO:     key : backbone_out
[2024/08/04 17:44:30] ppocr INFO:     name : VQASerTokenLayoutLMLoss
[2024/08/04 17:44:30] ppocr INFO:     num_classes : 5
[2024/08/04 17:44:30] ppocr INFO: Metric : 
[2024/08/04 17:44:30] ppocr INFO:     main_indicator : hmean
[2024/08/04 17:44:30] ppocr INFO:     name : VQASerTokenMetric
[2024/08/04 17:44:30] ppocr INFO: Optimizer : 
[2024/08/04 17:44:30] ppocr INFO:     beta1 : 0.9
[2024/08/04 17:44:30] ppocr INFO:     beta2 : 0.999
[2024/08/04 17:44:30] ppocr INFO:     lr : 
[2024/08/04 17:44:30] ppocr INFO:         epochs : 50
[2024/08/04 17:44:30] ppocr INFO:         learning_rate : 1e-05
[2024/08/04 17:44:30] ppocr INFO:         name : Linear
[2024/08/04 17:44:30] ppocr INFO:         warmup_epoch : 2
[2024/08/04 17:44:30] ppocr INFO:     name : AdamW
[2024/08/04 17:44:30] ppocr INFO:     regularizer : 
[2024/08/04 17:44:30] ppocr INFO:         factor : 0.0
[2024/08/04 17:44:30] ppocr INFO:         name : L2
[2024/08/04 17:44:30] ppocr INFO: PostProcess : 
[2024/08/04 17:44:30] ppocr INFO:     class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/04 17:44:30] ppocr INFO:     name : VQASerTokenLayoutLMPostProcess
[2024/08/04 17:44:30] ppocr INFO: Train : 
[2024/08/04 17:44:30] ppocr INFO:     dataset : 
[2024/08/04 17:44:30] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/04 17:44:30] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/04 17:44:30] ppocr INFO:         name : SimpleDataSet
[2024/08/04 17:44:30] ppocr INFO:         ratio_list : [1.0]
[2024/08/04 17:44:30] ppocr INFO:         transforms : 
[2024/08/04 17:44:30] ppocr INFO:             DecodeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 channel_first : False
[2024/08/04 17:44:30] ppocr INFO:                 img_mode : RGB
[2024/08/04 17:44:30] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/04 17:44:30] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/04 17:44:30] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/04 17:44:30] ppocr INFO:                 contains_re : False
[2024/08/04 17:44:30] ppocr INFO:                 order_method : tb-yx
[2024/08/04 17:44:30] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/04 17:44:30] ppocr INFO:             VQATokenPad : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:                 return_attention_mask : True
[2024/08/04 17:44:30] ppocr INFO:             VQASerTokenChunk : 
[2024/08/04 17:44:30] ppocr INFO:                 max_seq_len : 512
[2024/08/04 17:44:30] ppocr INFO:             Resize : 
[2024/08/04 17:44:30] ppocr INFO:                 size : [224, 224]
[2024/08/04 17:44:30] ppocr INFO:             NormalizeImage : 
[2024/08/04 17:44:30] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/04 17:44:30] ppocr INFO:                 order : hwc
[2024/08/04 17:44:30] ppocr INFO:                 scale : 1
[2024/08/04 17:44:30] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/04 17:44:30] ppocr INFO:             ToCHWImage : None
[2024/08/04 17:44:30] ppocr INFO:             KeepKeys : 
[2024/08/04 17:44:30] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/04 17:44:30] ppocr INFO:     loader : 
[2024/08/04 17:44:30] ppocr INFO:         batch_size_per_card : 8
[2024/08/04 17:44:30] ppocr INFO:         drop_last : False
[2024/08/04 17:44:30] ppocr INFO:         num_workers : 4
[2024/08/04 17:44:30] ppocr INFO:         shuffle : True
[2024/08/04 17:44:30] ppocr INFO: train with paddle 2.5.1 and device Place(gpu:0)
W0804 17:44:31.987720 758615 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0804 17:44:31.989112 758615 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024/08/04 17:44:36] ppocr INFO: resume from ./output/ser_vi_layoutxlm_xfund_zh/best_accuracy/
[2024/08/04 17:44:36] ppocr WARNING: The first GPU is used for inference by default, GPU ID: 0
[2024/08/04 17:44:37] ppocr WARNING: The first GPU is used for inference by default, GPU ID: 0
[2024-08-04 17:44:38,120] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-04 17:44:38,726] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-04 17:44:38,726] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/04 17:44:40] ppocr INFO: resume from ./output/re_vi_layoutxlm_xfund_zh/best_accuracy/
Traceback (most recent call last):
  File "/home/aistudio/PaddleOCR/./tools/infer_kie_token_ser_re.py", line 216, in <module>
    result = ser_re_engine(data)
  File "/home/aistudio/PaddleOCR/./tools/infer_kie_token_ser_re.py", line 151, in __call__
    preds = self.model(re_input)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 85, in forward
    x = self.backbone(x)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/PaddleOCR/ppocr/modeling/backbones/vqa_layoutlm.py", line 248, in forward
    x = self.model(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1412, in forward
    loss, pred_relations = self.extractor(sequence_output, entities, relations)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1254, in __call__
    return self.forward(*inputs, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1304, in forward
    relations, entities = self.build_relation(relations, entities)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/layoutxlm/modeling.py", line 1248, in build_relation
    all_possible_relations = paddle.stack(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py", line 1842, in stack
    return _C_ops.stack(x, axis)
ValueError: (InvalidArgument) x dim number should greater than 0, but received value is: 0
  [Hint: Expected x_dim > 0, but received x_dim:0 <= 0:0.] (at ../paddle/phi/backends/gpu/gpu_launch_config.h:180)

Additional

No response

Are you willing to submit a PR?

SWHL commented 3 months ago

方便把模型提供一下吗?这种看,看不出来啥错误

freezehe commented 3 months ago

我是在你们平台上使用的,我提供我的项目编号,你们可以在后台查看我的项目?我是在百度studio的https://aistudio.baidu.com/projectdetail/8221656

SWHL commented 3 months ago

PaddleOCR 现在是社区人员在维护,不是百度官方维护了。我们项目管理人员大部分都不是百度的哈,看不到你这项目的。

freezehe commented 3 months ago

那我模型怎么发你呢?很大的,几个G

freezehe commented 3 months ago

PaddleOCR 现在是社区人员在维护,不是百度官方维护了。我们项目管理人员大部分都不是百度的哈,看不到你这项目的。

我在issuse搜了一下,这个问题还是有不少人遇到过的,https://github.com/PaddlePaddle/PaddleOCR/issues/11261 这个issue你们解决了吗?