clowder-framework / extractors-s2orc-pdf2text

Extractor to convert pdf to text
Apache License 2.0
1 stars 0 forks source link

Convert s2orc pdf2text json output to txt #8

Closed minump closed 1 year ago

minump commented 1 year ago

Convert the json output from s2orc-pdf2text extractor to a .txt file. Get only "text" from the json output file, concatenate them and write to a .txt file. Upload the .txt file to same dataset in clowder.

This will be built on top of https://github.com/clowder-framework/extractors-s2orc-pdf2text/pull/7

minump commented 1 year ago

PR merged. Closing this issue.