Open imvs95 opened 1 year ago
I think this could best be done by adding a "date collected"
column, with the date on which each data row is collected. The advantage is that we keep all the raw data this way, and can track how much data is. Then we could modify scripts to either keep both or use the newest data available when combining data.
Now that I think of it, alle the raw data files already contain the date in their name. So a separate column in the raw data isn't needed. Maybe the combined data could contain columns with "First detected"
and "Last updated"
dates.
This also depends on what our criterea are for two routes to be the same, and of course this could change over time.
For now I see no immediate action, since all collected data is already date-stamped in the file name. So I agree with a low priority on this.
Old data is less up-to-date than new data for webscrapers