MODA-NYC / db-recovery-data-partnership

Data pipelines for datasets that are part of the Recovery Data Partnership project
https://www1.nyc.gov/site/analytics/initiatives/recovery-data-partnership.page
12 stars 8 forks source link

Documentation: How data are updated #96

Closed AmandaDoyle closed 4 years ago

AmandaDoyle commented 4 years ago

Organized by data provider > output dataset briefly write how the output is updated. Information should include where the source data comes from, and how an update is triggered (i.e. Axway, and new file is uploaded). This can be a Wiki Page

SPTKL commented 4 years ago

Urban System Labs - USL

Source: Axway

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/recipes/usl/runner.sh#L12

Update Trigger: Manual

Update Cycle: Never (this is a one time upload only dataset)

SPTKL commented 4 years ago

Upsolve

Source: Google Sheets

Update Trigger: None

Update Cycle: Every Other Day

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/.github/workflows/upsolve.yml#L6

SPTKL commented 4 years ago

StreetEasy

Source: S3

Update Trigger: Schedule

The script will check every week for new file uploads, if uploaded, then new table will be created, else ignore. Since street_easy_rental_sales_index is not time stamped in source data, so we will always update whenever streeteasy_weekly_nta is updated.

Update Cycle: Weekly (every Wednesday)

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/.github/workflows/street_easy.yml#L6

SPTKL commented 4 years ago

Cuebiq - cuebiq_mobility

Source: S3

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/recipes/cuebiq/runner_mobility.sh#L15

Update Trigger: Schedule

Checked every other day, and create a new table versioned by the day of update regardless of if new files are uploaded. (source file updates have irregular patterns, so we will default to check every other day)

Update Cycle: every other day

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/.github/workflows/cuebiq_mobility.yml#L6


Cuebiq - cuebiq_daily

Source: S3

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/recipes/cuebiq/runner_daily.sh#L19

Update Trigger: Schedule

Checked daily

Update Cycle: every day

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/.github/workflows/cuebiq_daily.yml#L6


Cuebiq - cuebiq_weekly

Source: S3

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/235d125f958ed67ee72f79909c77097d23783531/recipes/cuebiq/runner_weekly.sh#L22

Update Trigger: Schedule

Checked daily, same mechanism as cuebiq_daily

Update Cycle: every day

updated on the same update cycle as cuebiq_daily https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/cuebiq_weekly.yml#L6


Cuebiq - cuebiq_travelers

Source: S3

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/cuebiq/runner_travelers.sh#L16

Update Trigger: Schedule

Checked daily.

Update Cycle: every day

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/cuebiq_travelers.yml#L6

SPTKL commented 4 years ago

Foursquare - foursquare_county

Source:

Note that VERSION comes from web-scraping of the following link https://visitdata.org/data-noncommercial https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/foursquare/runner_county.sh#L24-L28

Update Trigger: Schedule Checked daily.

Update Cycle: every day https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/foursquare_county.yml#L6

Foursquare - foursquare_zipcode

Source: Google Drive

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/master/recipes/foursquare/datacube.py

Update Trigger: Schedule

Checked daily.

Update Cycle: Daily

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/foursquare_zipcode.yml#L6

SPTKL commented 4 years ago

Linkedin

Source: Axway

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/linkedin/runner.sh#L12-L15

Update Trigger: Scheduled

Update Cycle: Weekly (We check if there's a new file available and update every week)

Note that the source data update cycle is irregular, even though the data itself is monthly, we will still update every week, just to make sure what we have is up-to-date https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/linkedin.yml#L6

SPTKL commented 4 years ago

Kinsa

Source:

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/kinsa/runner.sh#L14

Update Trigger: Schedule

Update Cycle: Every other day

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/kinsa.yml#L6

SPTKL commented 4 years ago

ioby

Source: Axway

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/ioby/runner.sh#L14-L27

Update Trigger: Manual

Update Cycle: Unknown

@mgraber do we know the update cycle for ioby?

SPTKL commented 4 years ago

BetaNYC

Source: Github

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/recipes/betanyc/build.py#L9-L28

Update Trigger: Schedule

Update Cycle: Every 3 days

Source data has irregular/infrequent update cycles by application design, hence we will default to check every other 3 days to ensure our files are the most up-to-date https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/betanyc.yml#L6

SPTKL commented 4 years ago

Opportunity Insights

Source: Github

Update Trigger: Schedule

Update Cycle: Weekly and Daily

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/a3b308d05bfe81544878f611e57b66cc4ea57792/.github/workflows/opp_insights_weekly.yml#L6 https://github.com/MODA-NYC/db-recovery-data-partnership/blob/f0ca210eebcfc22fc6edad731b307931aba1c9b6/.github/workflows/opp_insights_daily.yml#L6

SPTKL commented 4 years ago

OATS

Source: Google Drive

https://github.com/MODA-NYC/db-recovery-data-partnership/blob/master/recipes/oats/get_data.py

Update Trigger: Schedule

Update Cycle: Unknown

last updated: Aug 26, 2020

@mgraber do we have an update cycle for OATS?

SPTKL commented 4 years ago

closing, migrated to excel spread sheet in teams