GlobalFishingWatch / treniformis

Apache License 2.0
5 stars 6 forks source link

Vessels missing from 2016 data #51

Closed Bjorn-skytruth closed 7 years ago

Bjorn-skytruth commented 7 years ago

We are missing likely fishing vessels which broadcast class B and include shiptype in type 19 messages but do not use type 24 messages. This is because the likely fishing filter query only looks at type 5 and 24.

@enriquetuya @pwoods25443 Some vessels that are of interest for the CNBC news report do not appear in GFW in 2016. Two other vessels do not identify as fishing and so don't appear at all but it would be great if there was an easy way to add them in.

MMSI Name in GFW 2016 track
412440372 FU YUAN YU 070 Yes No
412440373 FUYUANYU071 Yes No
412440376 FUYUANYU075 Yes No
200008124 FU YUAN YU 072 No No
200008125 FU YUAN YU 073 No No

The three tracks we have until the end of 2015 can be seen here, http://globalfishingwatch.org/map/workspace/udw-a34a242e-8be5-4d3e-b434-65e508b8fe0c

enriquetuya commented 7 years ago

@davidkroodsma This may be related to the new lists you posted back on Nov 29th: https://github.com/GlobalFishingWatch/data-dev/blob/master/points_per_vessel/2010-11-29-list-comparison.md

The production tileset has for 2012-2015 the old list and for 2016 the new lists.

pwoods25443 commented 7 years ago

@Bjorn-skytruth it looks like they are not being picked up as "Likely Fishing" in 2016, but they are detected as "Active"

GFW/ACTIVE_MMSI/2014.txt:412440372
GFW/ACTIVE_MMSI/2015.txt:412440372
GFW/ACTIVE_MMSI/2016.txt:412440372
GFW/FISHING_MMSI/KNOWN_AND_LIKELY/2014.txt:412440372
GFW/FISHING_MMSI/KNOWN_AND_LIKELY/2015.txt:412440372
GFW/FISHING_MMSI/LIKELY/2014.txt:412440372
GFW/FISHING_MMSI/LIKELY/2015.txt:412440372
pwoods25443 commented 7 years ago

Looks like the problem is here - none of the first 3 vessels on the list are selected by this query for 2016. This is the first half of the query used to select "Likely Fishing"

  SELECT
    mmsi,
    count(*) c_msg,
    sum (shiptype_text = 'Fishing') c_fishing,
    sum (shiptype_text = 'Fishing') / count(*) fishing_msg_ratio
  FROM (TABLE_DATE_RANGE([pipeline_740__normalize.], TIMESTAMP('2016-01-01'), TIMESTAMP('2017-01-01')))
  WHERE
    mmsi in (412440372, 412440373, 412440376) and
    type in (5, 24)
    and shiptype_text is not null
    and shiptype_text != 'Not available'
  GROUP EACH BY
    mmsi
pwoods25443 commented 7 years ago

I found the problem. These vessels are broadcasting shiptype only in type 19 messages

SELECT
  COUNT(*) AS count,
  mmsi,
  shiptype_text, type
FROM
  TABLE_DATE_RANGE([world-fishing-827:pipeline_740__normalize.], TIMESTAMP('2016-01-01'), TIMESTAMP('2017-01-01'))
WHERE
  mmsi IN (412440372)
  AND shiptype_text IS NOT NULL
GROUP BY
  mmsi,
  shiptype_text, type
LIMIT
  1000

returns

[
  {
    "count": "334",
    "mmsi": "412440372",
    "shiptype_text": "Fishing",
    "type": "19"
  }
]
davidkroodsma commented 7 years ago

This is interesting. It looks like we need to update the query in Treniformis to include

type in (5, 24, 19)

instead of

type in (5, 24)