e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Assertion error while running the pipeline on one day at a time #135

Open shankari opened 8 years ago

shankari commented 8 years ago

Running the pipeline after loading my data one day at a time.

./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py /tmp/shankari.2016-03-02 -u e.mission.berkeley.test@gmail.com
./e-mission-py.bash bin/intake_stage.py
./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py /tmp/shankari.2016-03-03 -u e.mission.berkeley.test@gmail.com
./e-mission-py.bash bin/intake_stage.py
./e-mission-py.bash bin/debug/load_timeline_for_day_and_user.py /tmp/shankari.2016-03-04 -u e.mission.berkeley.test@gmail.com
./e-mission-py.bash bin/intake_stage.py

2016-03-13 00:38:33,031:INFO:**********UUID 951779de-a10c-3373-b186-c1c9b14b5e38: finding common trips**********
not old
16
start lat = -122.0884047
start lat = -122.0909628
start lat = -122.087162
start lat = -122.0859431
start lat = -122.0843215
start lat = -122.0780399
start lat = -122.2662268
start lat = -122.2589637
start lat = -122.2585206
start lat = -122.2608805
start lat = -122.0859637
After removing trips that are points, there are 10 data points
number of bins before filtering: 9
the new number of trips is 2
the cutoff point is 1
number of bins after filtering: 1
2016-03-13 00:38:33,153:DEBUG:min_clusters = 1, max_clusters = 2, len(self.points) = 2
2016-03-13 00:38:33,153:DEBUG:min_clusters < 2, setting min_clusters = 2
2016-03-13 00:38:33,154:DEBUG:max_clusters >= len(self.points), setting max_clusters = 1
Traceback (most recent call last):
  File "bin/intake_stage.py", line 48, in <module>
    esdtmq.make_tour_model_from_raw_user_data(uuid)
  File "/Users/shankari/e-mission/e-mission-server/emission/storage/decorations/tour_model_queries.py", line 47, in make_tour_model_from_raw_user_data
    list_of_cluster_data = eamtmcp.main(user_id, False)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/modelling/tour_model/cluster_pipeline.py", line 105, in main
    n, labels, data = cluster(data, len(bins), old=old)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/modelling/tour_model/cluster_pipeline.py", line 85, in cluster
    feat.cluster(min_clusters=min, max_clusters=max)
  File "/Users/shankari/e-mission/e-mission-server/emission/analysis/modelling/tour_model/featurization.py", line 84, in cluster
    raise ValueError('Please provide a valid range of cluster sizes')
ValueError: Please provide a valid range of cluster sizes
shankari commented 8 years ago

my_week_trips.zip

shankari commented 8 years ago

I have a feeling that this is related to #243 because if I load the data for all the days at once and then run the intake pipeline, everything works just fine. But if I load the data for a day at a time (which is the way that the data will arrive in the real world), I get the assertion error.

shankari commented 8 years ago

What seems to be happening is that every time I run the pipeline, the number of common places goes down.

After loading day 02:

2016-03-13 10:17:46,402:DEBUG:About to save model with len(places) = 4 and len(trips) = 2

After loading day 03:

2016-03-13 10:20:19,982:DEBUG:About to save model with len(places) = 2 and len(trips) = 1

After loading day 04:

After removing trips that are points, there are 10 data points
number of bins before filtering: 9
the new number of trips is 2
the cutoff point is 1
number of bins after filtering: 1
2016-03-13 10:21:10,054:DEBUG:About to save model with len(places) = 2 and len(trips) = 1

After loading day 05:

After removing trips that are points, there are 12 data points
number of bins before filtering: 10
the new number of trips is 4
the cutoff point is 2
number of bins after filtering: 2
number of clusters: 2
number of locations: 2
2016-03-13 10:23:15,822:DEBUG:About to save model with len(places) = 1 and len(trips) = 1

It looks like the proposed change to stop deleting if there are only a few places didn't actually work. Going to workaround this for now until we can figure out how best to deal with it.