Data extraction from PDFs/imgs to mark-down

Aisuko commented 2 months ago

Reference article

Kaggle notebook for extracting your data of PDFs to mark-down.

@moonxjz @cbh778899

cc: @Micost If you have time, help them implement the article on Kaggle and share with them

cbh778899 commented 2 months ago

Please check here for proceed raw data. Most multimodel GenAI can work with images directly, so no extraction step needed.

Aisuko commented 2 months ago

cc @moonxjz

Aisuko commented 2 months ago

We finish the data extraction to convert all the PDFs to images. The next step would be the data extraction from images.