PaddlePaddle / PaddleNLP

πŸ‘‘ Easy-to-use and powerful NLP and LLM library with πŸ€— Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πŸ—‚Text Classification, πŸ” Neural Search, ❓ Question Answering, ℹ️ Information Extraction, πŸ“„ Document Intelligence, πŸ’Œ Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.72k stars 2.86k forks source link

[Question]: Document-Parser for Ernie-Layout #6350

Open Mohamed-Dhouib opened 11 months ago

Mohamed-Dhouib commented 11 months ago

Hello, I noticed on the Ernie Layout Paper that you utilize Document-Parser for optimal organization of input text. However, I am unable to locate where this preprocessing occurs in the code. Could you kindly guide me to the relevant section or provide a link to the Document-Parser toolkit ? Thanks !

aashishpokharel commented 9 months ago

Have you found the preprocessing step @Mohamed-Dhouib ? I'm stuck in the same problem

Mohamed-Dhouib commented 9 months ago

@aashishpokharel unfortunately no ..

aashishpokharel commented 9 months ago

Just found the reason for that. #4015 mentions that the Serialization module hasn't been open sourced yet.