1jamesthompson1 / TAIC-report-summary

Using LLM technologies to analyze transport accident investigation reports
https://taic-viewer-72e8675c1c03.herokuapp.com/
GNU General Public License v3.0
0 stars 0 forks source link

PDF text extraction is missing pages #164

Open 1jamesthompson1 opened 1 month ago

1jamesthompson1 commented 1 month ago

Problem

Currently in the PDFParser the PDFs are parsed into text. There is a problem where some of the pages are missed out.

This affectes sections extraction for #146, for two reasons:

Ideas and suggestions

Links and references