Slow transists dataset evaluation

redhog commented 8 years ago

I have fixed the slow-transits data so that it has the same format as the other datasets and reuploaded it to google drive. Here is the file list:

datasets/slow-transits.json datasets/slow-transits.measures.npz datasets/slow-transits.msg datasets/slow-transits.npz

Please integrate this in your model evaluation

bitsofbits commented 8 years ago

@redhog , Quick update:

If I just use the transit data for testing, all of the models are terrible (~50% false positives on the transit data).
If I split the transit data and use half of it for training, then the Logistic Model (MW) improves a lot (~0% false positives) while the Random Forest (MW) model improves somewhat (16%) false positives. I expect that the RF model is overfitting because we don't have enough transit data.

I need to check this over and clean this up a bit, then I'll commit it to the refactor branch.

bitsofbits commented 8 years ago

Completed some time ago.

GlobalFishingWatch / vessel-scoring