Open bitsofbits opened 7 years ago
Question: Do we want to publish all mmsi, or just the ones that make it to our fishing lists? I think we only want the ones that make it to our fishing lists. In that case, let's start by only including ACTIVE_POINTS
. For the next iteration, we can add many other things:
So, let's keep in mind that we want this, but not include it yet. Sound good?
@davidkroodsma, Actually I think we wanted to publish all MMSI with ACTIVE_POINTS and FISHING_POINTS or some-such. Then someone else could come along and make there own list of likely fishing vessels with a different cutoff. At least that was my takeaway from @pwoods25443 comments this morning.
@bitsofbits All MMSI is a problem because there are >10^6 mmsi, with most of them as noise. We need some minimum cutoff of what is a real boat. I use 100 points in the year, but have some ideas about smarter ways to do it.
The point, though, is that it is a bit trickier to just list all mmsi.
@davidkroodsma: OK, what about some loose cutoff (say 100 points a year or whatever smarter approach you have in mind), but list all the boats above that cutoff. Does that get us a more reasonable set of vessels?
@bitsofbits 100 is usually what I use, but we could probably get away with 50 if we wanted to: https://github.com/GlobalFishingWatch/data-dev/blob/master/david/noise/2016-11-15-noise.md
@pwoods25443 suggests that rather than storing just the active vessels (for instance), we store the parameters we use to decide activeness for a year (number of AIS points, etc) in a CSV file and then derive the active lists from this raw data. This way other people can compute their own activity statistics.
Perhaps we can create one file for all of the derived data for the year and derive spoofing, active, l likely fishing from that? We could at least get active and likely fishing, from ACTIVE_POINTS and FISHING_POINTS. Spoofing may be more complicated.
David had thoughts? @davidkroodsma ?