JaseZiv / worldfootballR

A wrapper for extracting world football (soccer) data from FBref, Transfermark, Understat
https://jaseziv.github.io/worldfootballR/
475 stars 61 forks source link

ingestion cron not running all days? #388

Closed fine-lemur closed 4 months ago

fine-lemur commented 4 months ago

Expected behaviour:

worldfootballR::load_match_results will return data updated daily

Actual behaviour:

The cron that generates the backend cache appears to only run certain days, so midweek games are not included in the results

tonyelhabr commented 4 months ago

This is intentional, as you can see from the commment here. @JaseZiv did recently update that workflow to run for all months. (We used to skip June and July.)

I think we could potentially just have the workflow run for all days in all months, but we'll have to be careful not to hit our GitHub usage limits.

Bigger picture, I'm not sure we ever intended the load_ functions to be relied upon so close to real time. I think the original inspiration was to (1) reduce the amount of crawlers that FBref had to deal with and (2) ease the pain of doing your own large backfills.

tonyelhabr commented 4 months ago

oh, looking here, it seems like you're already familiar with the scraping schedule.

my points still stand. i'll leave it to @JaseZiv to update the workflow schedule if he thinks that is best

JaseZiv commented 4 months ago

@tonyelhabr is absolutely correct. This schedule was intentional for two reasons:

  1. As tony stated, don't want to exhaust our GitHub usage
  2. Also don't want to hammer the sites we're getting data from every single day

Also as Tony stated, the load_ functions we mainly designed for previous seasons. Use the current scraping functions if you want anything more up-to-date.

Thanks

fine-lemur commented 4 months ago

Ok that's clear. Thanks