GlobalFishingWatch / treniformis

Apache License 2.0
5 stars 6 forks source link

Store More Detailed Info From Query #57

Open bitsofbits opened 7 years ago

bitsofbits commented 7 years ago

@pwoods25443 suggests that rather than storing just the active vessels (for instance), we store the parameters we use to decide activeness for a year (number of AIS points, etc) in a CSV file and then derive the active lists from this raw data. This way other people can compute their own activity statistics.

davidkroodsma commented 7 years ago

Question: Do we want to publish all mmsi, or just the ones that make it to our fishing lists? I think we only want the ones that make it to our fishing lists. In that case, let's start by only including ACTIVE_POINTS. For the next iteration, we can add many other things:

So, let's keep in mind that we want this, but not include it yet. Sound good?

bitsofbits commented 7 years ago

@davidkroodsma, Actually I think we wanted to publish all MMSI with ACTIVE_POINTS and FISHING_POINTS or some-such. Then someone else could come along and make there own list of likely fishing vessels with a different cutoff. At least that was my takeaway from @pwoods25443 comments this morning.

davidkroodsma commented 7 years ago

@bitsofbits All MMSI is a problem because there are >10^6 mmsi, with most of them as noise. We need some minimum cutoff of what is a real boat. I use 100 points in the year, but have some ideas about smarter ways to do it.

The point, though, is that it is a bit trickier to just list all mmsi.

bitsofbits commented 7 years ago

@davidkroodsma: OK, what about some loose cutoff (say 100 points a year or whatever smarter approach you have in mind), but list all the boats above that cutoff. Does that get us a more reasonable set of vessels?

davidkroodsma commented 7 years ago

@bitsofbits 100 is usually what I use, but we could probably get away with 50 if we wanted to: https://github.com/GlobalFishingWatch/data-dev/blob/master/david/noise/2016-11-15-noise.md