Closed ChidanandKumarVimaan closed 1 year ago
Is there also the same error when only using the XFUND_zh dataset to train the RE task?
No, These issues are not encountered when using XFUND_zh dataset to train the RE task. Problem comes when mixing Multilingual data( all the 7 languages).
The conversion of each languages according to the format given in the repo using https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/kie/tools/trans_xfun_data.py.
After mixing all the languages, exceptions are thrown when training with multilingual data
@an1018 , kindly help in case if you know any solutions as i have tried by best
You can try mix two languages(zh and anthor), and see if there is the same error
@an1018 , kindly specify what is anthor. I couldn't get it.
Also in the paper "LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding",using Multi-task fine-tuning is, accuracy of RE F1-score improve greatly when we train it on multiple languages as reported in page no:8 https://arxiv.org/pdf/2104.08836.pdf.
To get that result, i want to replicate the results. Kindly help
Anthor means any language, we can train with zh and de dataset for example, using two languages help us further locate the problem
Sure will do the experiment and report
@an1018 crashes occur only pt and it language. For both of these languages same crashes below
Traceback (most recent call last):
File "tools/train.py", line 208, in
@an1018 Kindly suggest
Sorry for the late reply. You can check the format of pt/it pictures, is there any badcase?
I resolved a problem by using lowercase to label
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
This is Multi-task fine-tuning on XFUND dataset, where we will mix all of the languages and train it on Relation extraction module in KIE part. The reason for using Multi-task fine-tuning is, accuracy of RE F1-score improve greatly when we train it on multiple languages as reported in page no:8 https://arxiv.org/pdf/2104.08836.pdf.
Code crashes because of https://github.com/PaddlePaddle/PaddleNLP/blob/2583b5ab68393545db68fd9631429de206bab270/paddlenlp/transformers/layoutxlm/modeling.py#L1248 all_possible_relations1.shape=0 and all_possible_relations2.shape=0 creating a meshgrid with size(0,0) raise an exception in paddle.meshgrid https://github.com/PaddlePaddle/PaddleNLP/blob/2583b5ab68393545db68fd9631429de206bab270/paddlenlp/transformers/layoutxlm/modeling.py#L1248
resulting in "Floating point exception(segmentation dumped)"