clowder-framework / extractors-s2orc-pdf2text

Extractor to convert pdf to text
Apache License 2.0
1 stars 0 forks source link

Update grobid #22

Closed minump closed 3 months ago

minump commented 3 months ago

Update grobid docker image https://hub.docker.com/r/grobid/grobid/tags Update to version 0.8.0 docker pull grobid/grobid:0.8.0

minump commented 3 months ago

Grobid version has been updated. Consort deployed instance now runs grobid:0.8.0 Grobid:0.8.0 has different logs than the previous version used. So not much logs info per file is displayed in the grobid container pod. To check if grobid "api/fullTextDocumentProcessing" is successful, see is the POST request has a 200 status in the pdf2text-extractor pod. Eg:

2024-05-22 19:25:17,542 [Thread-25 (_process_message)] INFO    : doc2txt.grobid2json.grobid.grobid_client - Processing pdf file in path /tmp/tmpwekd8v2m.pdf with name heartjnl194654
2024-05-22 19:25:25,478 [Thread-25 (_process_message)] INFO    : doc2txt.grobid2json.grobid.grobid_client - POST  Grobid service processFulltextDocument. Status 200
minump commented 3 months ago

Closing this as completed https://git.ncsa.illinois.edu/kubernetes/clusters/consort/-/merge_requests/80