PDF-Accessibility-Initiative / grand-plans

Non-code repo for overall project organization
0 stars 0 forks source link

Evaluate various open-source PDF libraries #1

Open HalosGhost opened 6 years ago

HalosGhost commented 6 years ago

If possible, our lives could be made much simpler if there are already PDF libraries out there which allow us to parse a PDF and query specific properties of it. There are many considerations for whether or not a library is reasonable for us to use:

If we cannot find a library which meets these criteria, we may need to look into implementing basic PDF parsing and querying ourselves.

HalosGhost commented 6 years ago

Some initial libraries to review:

HalosGhost commented 6 years ago

Apparently, Libreoffice has the ability to export tagged PDFs! LO uses various document libraries for export and document compatibility. The first library to evaluate should be these to see if these libraries explicitly support PDF parsing (even minimally) and/or PDF exporting with tags.