Closed smpiano closed 1 year ago
Added changes related to the structure, keep everything related to voyages under the pipe_anchorages/voyages
folder. Removes the abstractions that were not needed, adds the abstractions in the build_voyages method. Replaces the variable names with the suggested in the comments.
Moved voyages calculation from BigQuery to Apache Beam job.
the
trip_start
dummy date (0001-02-03 00:00:00 UTC
) was moved tonull
.the
trip_start_anchorage_id
which use to containNO_PREVIOUS_DATA
was moved tonull
.the
trip_start_visit_id
which use to containNO_PREVIOUS_DATA
was moved tonull
.the
trip_id
was formatted from{ssvid}-{vessel_id}-{hex(..)}
to have{ssvid}-{vessel_id}
. This only applies for init voyages.the
trip_end
dummy date was moved tonull
.the
trip_end_anchorage_id
which use to containACTIVE_VOYAGE
was moved tonull
.the
trip_end_visit_id
which use to containACTIVE_VOYAGE
was moved tonull
.the
trip_start
is the partitioned field of the output tables.the
cluster fields
now aretrip_start, ssvid, vessel_id, trip_id
.Adding also the restriction that a voyage should have a duration, so
trip_start
andtrip_end
should be the same.Screenshots of Graph of the DF job:
=> Times differences: BigQuery
~1.3h
.~1.3h
.~2h
. Due these queries run in sequential (because they use most of the slots) Total~5h
. costs: ?DF job (Using job_id:
2023-11-06_17_47_09-8157413677400264045
) Time:33 min 46 sec
. costs:$4.15
.The current scratch tables results are in :
scratch_matias_ttl_60_days.voyages_duration_c2
scratch_matias_ttl_60_days.voyages_duration_c3
scratch_matias_ttl_60_days.voyages_duration_c4