Currently in the PDFParser the PDFs are parsed into text. There is a problem where some of the pages are missed out.
This affectes sections extraction for #146, for two reasons:
Some of the sections are missed out as they are not in the txt file
The section that is before the missing page will capture until the next higher section as it cant find the end of its own section (becuase it finds the end of its section by trying to find the satrt of the next section).
Problem
Currently in the
PDFParser
the PDFs are parsed into text. There is a problem where some of the pages are missed out.This affectes sections extraction for #146, for two reasons:
Ideas and suggestions
Links and references