Closed thekaveman closed 2 years ago
@atvaccaro notes:
I would consider making
ingest_amplitude_raw_dev
andingest_amplitude_raw_dev_prod
potentially.Oh and to follow-up on this, I would use
is_development()
in calitp-py to pick between the two.
A bit more context here-- the AirtableToWarehouse
operator calls calitp.save_to_gcs, which does still call get_bucket
under the hood.
So, @atvaccaro and I do think it would be great if you all write to a brand new bucket -- that is very much aligned with the future direction of the bucket structure and will save a migration later. However as Andrew mentioned you would probably need to look to calitp.is_development
directly in a new pattern and not be able to rely on the existing save_to_gcs
or get_bucket
since those are hard-coded to gtfs-data
and gtfs-data-test
.
(TLDR: We appreciate the openness to moving to the new paradigm but it does come with some extra work since we haven't built out as much support for that direction yet; if there's anything we can do to help make that clearer or easier please let us know.)
@lauriemerrell we're more than happy to help explore the new paradigm! Please let us know if we can be doing anything (docs?) to help make this easier next go-around.
It does appear that save_to_gcs()
allows for a bucket
param to override using get_bucket()
; so we should be good with minimal changes, just using is_development()
as mentioned.
Awesome, thank you! That's a great question re: docs... I actually think the main thing (not actually directly to related to this question about buckets) might just be adding a note in the datasets and tables section of the docs: https://docs.calitp.org/data-infra/datasets_and_tables/overview.html about what the Amplitude data is.
Part of the work for #960.
See this comment from @atvaccaro for details
Tasks
bucket
parameter, equal toingest_amplitude_raw_dev
oringest_amplitude_raw_prod
depending on the value ofis_development()