Open jwestw opened 2 years ago
One option could be to save intermediate data to avoid doing all merging when running the main pipeline.
We could have something similar to the stops data where we check if the intermediate data is less then 28 days old (i.e check if persistent data exists) and if it isn't we could produce the intermediate data to have a newer version.
Calculate the entire nation for one year in one dataframe with a vectorized calculations.
If Pandas is too slow, consider using pure Numpy.