Closed mathemancer closed 4 years ago
start date for flickr is 1970 because they have few images at that time , is there any such date for metropolitan
You can use 2020-01-01. We'll never need to run it back further than that date. The reason is that the 'date' parameter for this script actually pulls metadata for all images that's been updated since the given date, and the metadata for all images has been updated since the beginning of this year.
Problem Description
In order to get the new
metropolitian_museum_of_art.py
script (see #278) into production, we need to implement a new Apache Airflow DAG that will run the script.Solution Description
Implement such a DAG. For examples, see
src/cc_catalog_airflow/dags/flickr_workflow.py
andsrc/cc_catalog_airflow/dags/wikimedia_workflow.py
. This DAG should be configured to run the main function fromsrc/cc_catalog_airflow/dags/metropolitan_museum_of_art.py
with the date parameter, once per day. It should havecatchup=False
. Theconcurrency
andmax_active_runs
parameters should both be 1.Alternatives
We may replace this DAG with some kind of DAG factory in the future, so it should be considered somewhat temporary.