Add offset framework to add_measures

GlobalFishingWatch / vessel-scoring

Apache License 2.0

14 stars 11 forks source link

Add offset framework to add_measures #61

Closed bitsofbits closed 7 years ago

bitsofbits commented 7 years ago

It turns out that the windowing discussed in issue: https://github.com/GlobalFishingWatch/rolling_measures/issues/1 , needs to be implemented in add_measures.AddWindowMeasures. The constructor of AddWindowMeasures needs to take an offset parameter.

The relevant function is process, and we'll need to tee into three streams one for the start, one for the end. And one for the location where we are looking.

redhog commented 7 years ago

@bitsofbits Heyas! I think this works, but I'm unable to test it thoroughly using the whole vessel scoring framework - training the models crashes my laptop (entirely, including the kernel) for some reason.

Could you take it from here and test it and then use it?

bitsofbits commented 7 years ago

@egil: Thanks. I'll take a look at this tomorrow or Monday when I'm back home.

bitsofbits commented 7 years ago

@redhog : I needed to fix a couple of things, but this is running now locally and I pushed the changes up. It does make a noticeable difference, particularly with purse seiners using random forest, although those still aren't great.

redhog commented 7 years ago

@bitsofbits Could you run this with an offset of zero? Thast should give no difference what so ever from the old code. Could you make sure this is the case?

bitsofbits commented 7 years ago

@redhog : I tried that and my initial result is that there was some difference. I'm working on tracking that down today.

bitsofbits commented 7 years ago

@redhog: It turned out that there were duplicates messages (same MMSI and timestamp) in Kristina's data, and the centered windowing treated these slightly differently. Rather than worry about that edge case, I just got rid of the duplicates since we shouldn't have them anyway. See #66 . With that in place, zero offset matched the old (dedupped code).

Also removed the is_daylight feature from the models when I was rerurnning CompareModels. In light of the new info that only some of the purse seiners (but all of the ones in the training set), fish during the day, this is probably a very misleading predictor at the moment.