Closed bartaelterman closed 9 years ago
@niconoe is it ok if I forward this to you to make it a Django command? I think we will schedule the scraper commands with cron
, and when all of them are finished, this job should run.
Yep!
For more flexibility and being able to deal with cron misconfiguration issues, shouldn't this command allow:
Indeed. However, "everything" will be difficult. Since for days before 2014, we don't know the number of journals scraped. (In fact, maybe we need a place to store this for new data too?)
@niconoe apparently the cutoff for the epu index is not 0 but -0.15.
So the EPU index is the number of articles with a epu score higher than -0.15 divided by the number of journals scraped.
So again, maybe we need a place where we can store the number of journals scraped. Some place where every scraper can write "I succeeded for this day". Maybe a table "journals scraped" with two columns "date" and "spider/journal name"?
See #64
It's now implemented, use as:
$ python manage.py calculate_daily_epu 2015-08-17
It tells what it does on stdout and store its result in EpuIndexScore. Please review and test!
Tested. Works perfectly.
Start a job after running the scrapers that will calculate and persist the epu index of yesterday.