Closed smpiano closed 2 years ago
thin_port_messages
port_events
port_state
position_messages
From slack:
The launch commands changed. It’s still a two step process though. First: docker-compose run thin_port_messages \ --job_name porteventstest \ --input_table pipe_production_v20201001.position_messages_ \ --anchorage_table anchorages.named_anchorages_v20201104 \ --start_date 2018-01-01 \ --end_date 2018-01-07 \ --output_table machine_learning_dev_ttl_120d.port_visit_msgs_v20220927_ \ --project world-fishing-827 \ --max_num_workers 100 \ --project world-fishing-827 \ --staging_location gs://machine-learning-dev-ttl-30d/anchorages/portevents/output/staging \ --temp_location gs://machine-learning-dev-ttl-30d/anchorages/temp \ --setup_file ./setup.py \ --runner DataflowRunner \ --disk_size_gb 100 \ --region us-central1 \ --sdk_container_image gcr.io/world-fishing-827/pipe-anchorage/worker:tim_test \ --experiments=use_runner_v2 This is the part that generates an internal tables. Then: docker-compose run port_visits \ --job_name portmessagestest \ --thinned_message_table machine_learning_dev_ttl_120d.port_visit_msgs_v20220927_ \ --end_date 2018-01-07 \ --vessel_id_table pipe_production_v20201001.segment_info \ --anchorage_table anchorages.named_anchorages_v20201104 \ --output_table machine_learning_dev_ttl_120d.port_visits_v20220927_ \ --project world-fishing-827 \ --max_num_workers 100 \ --project world-fishing-827 \ --staging_location gs://machine-learning-dev-ttl-30d/anchorages/portevents/output/staging \ --temp_location gs://machine-learning-dev-ttl-30d/anchorages/temp \ --setup_file ./setup.py \ --runner DataflowRunner \ --disk_size_gb 100 \ --region us-central1 \ --sdk_container_image gcr.io/world-fishing-827/pipe-anchorage/worker:tim_test \ --experiments=use_runner_v2 \ --bad_segs "(SELECT DISTINCT seg_id FROM world-fishing-827.gfw_research.pipe_v20201001_segs WHERE overlapping_and_short)" This generates the actual visits. It regenerates the whole table from scratch every day (same as previously because vessel ID changes).
The launch commands changed. It’s still a two step process though. First:
docker-compose run thin_port_messages \ --job_name porteventstest \ --input_table pipe_production_v20201001.position_messages_ \ --anchorage_table anchorages.named_anchorages_v20201104 \ --start_date 2018-01-01 \ --end_date 2018-01-07 \ --output_table machine_learning_dev_ttl_120d.port_visit_msgs_v20220927_ \ --project world-fishing-827 \ --max_num_workers 100 \ --project world-fishing-827 \ --staging_location gs://machine-learning-dev-ttl-30d/anchorages/portevents/output/staging \ --temp_location gs://machine-learning-dev-ttl-30d/anchorages/temp \ --setup_file ./setup.py \ --runner DataflowRunner \ --disk_size_gb 100 \ --region us-central1 \ --sdk_container_image gcr.io/world-fishing-827/pipe-anchorage/worker:tim_test \ --experiments=use_runner_v2
This is the part that generates an internal tables. Then:
docker-compose run port_visits \ --job_name portmessagestest \ --thinned_message_table machine_learning_dev_ttl_120d.port_visit_msgs_v20220927_ \ --end_date 2018-01-07 \ --vessel_id_table pipe_production_v20201001.segment_info \ --anchorage_table anchorages.named_anchorages_v20201104 \ --output_table machine_learning_dev_ttl_120d.port_visits_v20220927_ \ --project world-fishing-827 \ --max_num_workers 100 \ --project world-fishing-827 \ --staging_location gs://machine-learning-dev-ttl-30d/anchorages/portevents/output/staging \ --temp_location gs://machine-learning-dev-ttl-30d/anchorages/temp \ --setup_file ./setup.py \ --runner DataflowRunner \ --disk_size_gb 100 \ --region us-central1 \ --sdk_container_image gcr.io/world-fishing-827/pipe-anchorage/worker:tim_test \ --experiments=use_runner_v2 \ --bad_segs "(SELECT DISTINCT seg_id FROM world-fishing-827.gfw_research.pipe_v20201001_segs WHERE overlapping_and_short)"
This generates the actual visits. It regenerates the whole table from scratch every day (same as previously because vessel ID changes).
NOTE: tests still failing.
thin_port_messages
in replace ofport_events
.port_state
.position_messages
using partitioned data.From slack:
NOTE: tests still failing.