IAAR-Shanghai / Meta-Chunking

Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception
Apache License 2.0
78 stars 3 forks source link

what about the images and tables in pdf #3

Closed Sere1nz closed 1 month ago

Sere1nz commented 1 month ago

It seems we only talking about text, but what if there are relevant tables or images that include stats or knowlegde after the text, which means the tables and images should be part of previous texts(should be in the same chunk). What should we do in this case?

Robot2050 commented 1 month ago

Thank you very much for your attention to our paper on text chunking. The issue you raised regarding the segmentation of texts containing tables and images in real-world scenarios is indeed a practical problem worthy of in-depth exploration. In our research, we have primarily focused on segmentation algorithms for long natural language texts, specifically on how to learn efficient text segmentation methods through logical perception.

However, your insights remind us that in practical applications, multimodal information is important for understanding and analyzing text content. Currently, the refined processing of multimodal information falls outside the scope of our research, so we may only be able to provide you with some potential ideas. For instance, you could first convert PDF documents into the more easily processable Markdown format [1,2] (which we found quite useful in our previous experience), then segment the text, and finally utilize a multimodal retrieval model to match the text with images. We hope this information is of assistance to you!

[1] https://mp.weixin.qq.com/s/P7-VhEpoNDkTJbhN7dGExA [2] https://mp.weixin.qq.com/s/Ntqu8RrsJd07fRcJi8JShw

Sere1nz commented 1 month ago

Thank you.