axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.76k stars 306 forks source link

Export document metadata in SimpleJSON format. #602

Open benlabbe opened 2 years ago

benlabbe commented 2 years ago

Summary I would like to read both the document content in the SimpleJSON format and the document metadata collection .

The problem I'm frustrated to see that the SimpleJSON export method does not contain the document metadata collection.

The solution I'd like The implementation of metadata handeling and exporting from the standard JSON export could be somehow replicated into the SimpleJSON export.

The alternative I've considered as a end-user I considered calling Parsr two times per document , and specifying respectively the JSON and SimpleJSON export methods. But this is far from optimal in terms of delay and throughput.