atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.66k stars 360 forks source link

Suggested feature: SQLite file format #212

Closed orent closed 5 years ago

orent commented 5 years ago

The SQLite 3 format is one of the dataset formats recommended by the Library of Congress as a digital preservation format:

https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml

Unlike CSV, it can represent multiple tables in a single file. It is also a useful engine for performing basic cleanup and postprocessing tasks without having to first import the data into another tool.