geometalab / Trending-Places-in-OpenStreetMap

Trending Places in OpenStreetMap!
http://geometalab.github.io/Trending-Places-in-OpenStreetMap
MIT License
11 stars 1 forks source link

Adapt handling of file dates of log files (Fetch2.py) #22

Open sfkeller opened 8 years ago

sfkeller commented 8 years ago

Fetch2.py currently expects all parameters („--date_from“ „--date-to“) to be valid before midnight. But e.g. a file date "2016-07-30 00:04" in the log file directory still refers to "2016-07-29".

das-g commented 8 years ago

After reading a lot of the project's code and experimenting a bit with the Zoom10Tiles.csv file's content, I think that check_data_validity() returned False on the days where the bot recently set its Twitter status to

Trending places bot has been unable to find data for the last few days... It will return tomorrow

As we don't explicitly pass the period length with --period, the default of 7 is used, so that check would fail if there are less than 7 unique dates with tile log data.

I think there were two reasons why this was the case, recently:

  1. ./main.sh (and with it, Fetch2.py) are run at midnight (00:00 UTC), for the 7-day-period ending 2 days ago. Sometimes the last anonymized logfile (the one for the day before the day that just ended) isn't present on http://planet.openstreetmap.org/tile_logs/ at that time, yet and thus won't be downloaded by Fetch2.py.
  2. The tile logs for 2016-07-31 and for 2016-08-01 seem to be missing completely on http://planet.openstreetmap.org/tile_logs/ . This will cause too few unique dates until these two dates drop out of the 7-day period on 2016-08-10.

To fix the first reason, I'd change the cron job time to a later time in the (UTC) day, e.g. 3:00 AM.

For the second reason I'm not so sure. How should we handle missing logs, @sfkeller? Look further into the past to get enough historical data? Interpolate data for the missing dates?