Modify the preprocessing script such that it is able to extract meta data automatically. Before the script requires information about the page number range for each chapter, but later on with need to extract data from more than 100 books, this step need to be automated.Refer to RAG0001
Expected Output
script able to extract meta data with input of pdf book and transcript.
Implementation Steps
[x] Extract the page range for each chapter.
[x] Extract out chapter names
[x] Extract page number from each page
[x] Filter out only the chapter content(exclude introduction, epilouge,...)
Description
Modify the preprocessing script such that it is able to extract meta data automatically. Before the script requires information about the page number range for each chapter, but later on with need to extract data from more than 100 books, this step need to be automated.Refer to RAG0001
Expected Output
script able to extract meta data with input of pdf book and transcript.
Implementation Steps