CityOfLosAngeles / aqueduct

A shared pipeline for building ETLs and batch jobs that we run at the City of LA for Data Science Projects. Built on Apache Airflow & Civis Platform
Apache License 2.0
21 stars 6 forks source link

Migrate Mayors Office Open Data ETLs #197

Open hunterowens opened 4 years ago

hunterowens commented 4 years ago

Currently @jaylenw and I are responding to a SNOW ticket where the Mayor's office data team a EC2 box where a number of ETLs are run. These

Lists

  1. Parking Citations
  2. Rainwater
  3. Geohub Sharing
  4. Google Analytics

Preston needs to determine which of these ETLs would be blocked by City network issues ( I think it is probably only parking citations and then).

@prestinomills to debug and submit PRs with scripts for each of the 4 key ETLs, @hunterowens to test and deploy on Civis. @jaylenw to cross train with Hunter then terminate EC2 when completed.

hunterowens commented 4 years ago

parking citations FTP is now accepting connections from teh Civis IP Range

hunterowens commented 4 years ago

for the code, see this repo

sherryshenker commented 4 years ago

Is this blocked right now by some FTP issues, or no?

hunterowens commented 4 years ago

Yep. See email to coduent (parking citations FTP folks)

Can you confirm the IP list for SFTP / FTP jobs. I'm worried they didn't properly whitelist

Is there any way to determine your IP in platform of a particular job?

sherryshenker commented 4 years ago

Here are the white-listed API addresses for the City of LA - Postgres cluster.

sherryshenker commented 4 years ago

@hunterowens just got an answer from our systems team. for the multi from SFTP import, we currently use 107.23.207.128, however in a few weeks the system will use IPs in this range 35.171.100.200/29