Open bbrewington opened 1 year ago
I'm working on the pipeline that takes data directly from gdrive to google query and does so as efficiently as possible, currently we have the different python scripts that takes data from gdrive to local, to GCS and finally to Google query. I'm trying to eliminate the need to transfer first to GCS before finally landing in Google Query
@Itguru14 made some updates, and need to polish the if __name__ == '__main__'
part of datapipelines/google_drive.py
I decided to use pydrive library b/c it's one of the better ones I found (man, Google does NOT make this easy) - planning on looping through files in the folder (that are either CSV, Excel, or Google Sheets), reading file contents to Pandas DataFrame, then writing (with all cols as string) to BigQuery
If you want to use this approach, feel free to pick up in the section I commented out
For easy reference, here's the commit w/ what I just pushed: https://github.com/Itguru14/tag-dssg-2023-lbc/commit/1fbaba1cab43382d90ef3af393e038e4d292481b
Ok. will do, just do whatever you can I will pickup the rest later tonite
On Wed, Jul 19, 2023 at 9:48 PM Brent Brewington @.***> wrote:
@Itguru14 https://github.com/Itguru14 made some updates, and need to polish the if name == 'main' part of datapipelines/google_drive.py
I decided to use pydrive library b/c it's one of the better ones I found (man, Google does NOT make this easy) - planning on looping through files in the folder (that are either CSV, Excel, or Google Sheets), reading file contents to Pandas DataFrame, then writing (with all cols as string) to BigQuery
If you want to use this approach, feel free to pick up in the section I commented out
— Reply to this email directly, view it on GitHub https://github.com/Itguru14/tag-dssg-2023-lbc/issues/1#issuecomment-1642984934, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASUAXVFCH3IFIN56PVRKLHLXRCE5NANCNFSM6AAAAAA2M6D3IQ . You are receiving this because you were mentioned.Message ID: @.***>
Code-driven data pipeline to take data from Google Drive (mix of Google Sheets, Excel files, and folders containing those), and land it in BigQuery dataset
tag-dssg-2023-lbc-all-teams.data_raw
with all columns as STRING typeOnce this is done, the follow-on story #2 can be started
For access to BigQuery, contact @bbrewington (TAG DSSG Slack or Email is fine)