Closed sherryshenker closed 4 years ago
The main dataset that is ETL'd in this process sections of the 311 dataset.
I think best way forward is to figure out a way to go from Socrata 311 full to Redshift and then add views as needed?
Otherwise, all DAGS should be in the public-health folder on aqueduct.
Yes I think that makes sense. I have some code for pulling from Socrata that currently works with public data sets. Is this data set private?
For other processes outside of the dashboard, are there some high priority connections you'd want to start with?
The "main" 311 dataset is private, with each year being a view. You can see a WIP DAG in Cityoflosangeles/aqueduct#130 that used the socrata api plus some keys to download it.
got it. looks like thats happening here I think we'll need to add the username, password and API token as credentials in Platform. We have a general account (cityoflosangelesrobot@civisanalytics.com) that will be used for all production code. Would it be possible for you to share a set of credentials with me, and then I can add to the account and test the API pull?
@sherryshenker yes, but that code is due for a refactor.
I can share socrata username/ PW by adding credentials to civis, right?
ie, we should probably consider the existing code busted. I think the best way is probably to use the shell script in that PR I linked above but to push into redshift first, then ESRI / PostGIS / Models whatever
the big one liner here is
curl https://data.lacity.org/api/views/69nn-wkid/rows.csv?accessType=DOWNLOAD \
-u '$SOCRATA_USERNAME:$SOCRATA_PASSWORD' \
-o /tmp/service-requests.csv
which downloads all the 311 requests to date as a CSV
this is complete.
thread to discuss which (if any) ETL processes should be transferred to Civis. I think in order for us to make recommendations, we'd like to first get a better understanding of what each of the ETL processes are composed of (sources, transformations), current pain points, complexity, and any anticipated issues of transferring / rewriting. @hunterowens