SpursGoZmy / Table-LLaVA

Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tabular MLLM named Table-LLaVA.
Apache License 2.0
162 stars 7 forks source link

Request for Sharing the Rendering Code #10

Open Uason-Chen opened 2 months ago

Uason-Chen commented 2 months ago

Hello,

Thanks for sharing the code and dataset for TableLLaVA. I greatly appreciate your time and effort in this project.

However, I noticed that the rendering code is not provided in the repository. I understand that there could be various reasons for this, but if it's possible, could you please share the rendering code? I believe it would be incredibly beneficial for me and others who are interested in this project, and it would provide us with a deeper understanding of your work.

SpursGoZmy commented 1 month ago

The rendered table images have 3 styles and each style has different rendering code.

  1. Markdown Style: Read the markdown table data as a DataFrame object with Pandas python package and use dataframe_image python package to convert the dataframe into a image.
  2. HTML Style: Get the html code of the original table or build a html code base on the table data, and use html2image python package to convert the html into a screentshot image. Then you need to write a script to clip the extra white space around the table.
  3. Excel Style: Write the table data into an Excel. Use xlwings package to read the excel file and convert it to a image.

Data Augmentation: during rendering the table image, data augmentations like changing the font type or cell color can be added.

For different datasets, the difficulty of rendering tables into images is different and it indeed needs quite a lot of careful work. We will try to clean up our rendering code for open-source. But we really can not guarantee a very soon DDL. Thanks for your understanding.