Open shankari opened 8 years ago
I have a feeling that this is related to #243 because if I load the data for all the days at once and then run the intake pipeline, everything works just fine. But if I load the data for a day at a time (which is the way that the data will arrive in the real world), I get the assertion error.
What seems to be happening is that every time I run the pipeline, the number of common places goes down.
After loading day 02:
2016-03-13 10:17:46,402:DEBUG:About to save model with len(places) = 4 and len(trips) = 2
After loading day 03:
2016-03-13 10:20:19,982:DEBUG:About to save model with len(places) = 2 and len(trips) = 1
After loading day 04:
After removing trips that are points, there are 10 data points
number of bins before filtering: 9
the new number of trips is 2
the cutoff point is 1
number of bins after filtering: 1
2016-03-13 10:21:10,054:DEBUG:About to save model with len(places) = 2 and len(trips) = 1
After loading day 05:
After removing trips that are points, there are 12 data points
number of bins before filtering: 10
the new number of trips is 4
the cutoff point is 2
number of bins after filtering: 2
number of clusters: 2
number of locations: 2
2016-03-13 10:23:15,822:DEBUG:About to save model with len(places) = 1 and len(trips) = 1
It looks like the proposed change to stop deleting if there are only a few places didn't actually work. Going to workaround this for now until we can figure out how best to deal with it.
Running the pipeline after loading my data one day at a time.