OpenPecha / rag_prep_tool

MIT License
0 stars 0 forks source link

RAG0011: Refactor Preprocessing Step (1) #13

Open tenzin3 opened 1 week ago

tenzin3 commented 1 week ago

Description

Modify the preprocessing script such that it is able to extract meta data automatically. Before the script requires information about the page number range for each chapter, but later on with need to extract data from more than 100 books, this step need to be automated.Refer to RAG0001

Image

Expected Output

script able to extract meta data with input of pdf book and transcript.

Implementation Steps

tenzin3 commented 6 days ago

Page Variations

Freedom in Exile

Image

Art of Happiness at Work

Image

My land and my people

Image

Ethics for new Millineum

Image