GlobalFishingWatch / anchorages_pipeline

Python pipeline for anchorages
Apache License 2.0
6 stars 3 forks source link

Add distance_to_port and at_doc to named_anchorages table #64

Closed bitsofbits closed 3 years ago

bitsofbits commented 4 years ago

According to @natemiller, it would be helpful to have distance_to_port and at_doc to the named_anchorages table. This looks like it's kind of headache to do in the named_anchorage table itself. The simple fix would be to generate a temporary table, add the fields on with a query and then delete the temporary table. This can all be done with BQ + airflow DAGs and we don't need to go through Beam.

Sample queries provided by Nate for adding distance from port and at doc:

SELECT 
  * except(gridcode)
FROM
  (
    SELECT *,
      format("lon:%+07.2f_lat:%+07.2f", round(lon/0.01)*0.01, round(lat/0.01)*0.01) as gridcode
    FROM `world-fishing-827.anchorages.named_anchorages_v20200420` 
    WHERE lat is NOT NULL AND lon IS NOT NULL
  )
LEFT JOIN 
  (
    SELECT
      gridcode,
      distance_from_shore_m
    FROM
    `pipe_static.spatial_measures`
  )
USING (gridcode)
WITH anchorages AS (SELECT 
  s2id,
  lat,
  lon,
  label,
  sublabel,
  iso3,
  distance_from_shore_m,
  label_source,
  drift_radius,
  total_visits,
  unique_total_ssvid,
  unique_active_ssvid,
  active_ssvid_days,
  unique_stationary_ssvid,
  stationary_ssvid_days,
  stationary_fishing_ssvid_days,
  unique_stationary_fishing_ssvid,
  top_destination
FROM `world-fishing-827.anchorages.named_anchorages_v*`
WHERE
_TABLE_SUFFIX = (
SELECT 
REGEXP_EXTRACT(table_id, r"[0-9]{8}$")
FROM `world-fishing-827.anchorages.__TABLES_SUMMARY__`
WHERE table_id LIKE 'named_anchorages_v%'
ORDER BY creation_time DESC
LIMIT 1))
SELECT
* EXCEPT(s2id1)
FROM (
SELECT
*
FROM
anchorages) a
LEFT JOIN
(SELECT
s2id AS s2id1,
dock AS at_dock
FROM `world-fishing-827.anchorages.anchorages_dock_label_v20191006` ) b
ON a.s2id = b.s2id1