IFRCGo / go-api

MIT License
14 stars 6 forks source link

Ingest issue for pdf scrapper #863

Open mmusori opened 4 years ago

mmusori commented 4 years ago

Issue

Ingest issue(s) occurred, one of them is scrape_pdfs, via CronJob log record id: https://goadmin.ifrc.org/api/cronjob/24915.

Expected behaviour

The python script runs once a day and downloads pdf documents from IFRC.org. Then reads the the documents, scrapes quantitative data and writes this to the GO database.

Impact: currently limited impact.

GergiH commented 4 years ago

Deployed the fix to staging. Also now the scrape_pdfs cronjob logs should have more information about errors. If some PDFs fail but the job itself runs fine it will say 'Successful' but will have the error messages in the log record about the failed documents. If the job itself fails it will give the Erroneous log.

nanometrenat commented 7 months ago

@szabozoltan69 is this still relevant given the work that's been done on cronjobs and ingests since 2020? Can we close? Thanks as always