bkkhack / hacknights

A meta-repository for finding hack night birds-of-a-feather and tracking what we learn.
https://waffle.io/bkkhack/hacknights
15 stars 4 forks source link

Self-learning GCP, Datalab, Python #201

Open ccasimiro9444 opened 7 years ago

ccasimiro9444 commented 7 years ago

Using NYC yellow taxi dataset

danfowler commented 7 years ago

:eyes:

danfowler commented 7 years ago

GCP == Google Cloud Platform ? Cool : What is GCP?

ccasimiro9444 commented 7 years ago

Yes, Google Cloud Platform. There are some large public datasets on Google's Bigquery, that could be imported into Google Cloud Datalab (similar to Jupyter Notebooks) and then I want to use Python to visually render the data. I am from a stats background, so Python and GCP are kinda new to me. But saw this in action and hence will try to replicate some of it. But I am already stuck importing the Bigquery data into Datalab :) Hope you guys can help me figuring it out.

ccasimiro9444 commented 7 years ago

my question on stackoverflow https://stackoverflow.com/questions/44172105/load-bigquery-data-to-datalab

ccasimiro9444 commented 7 years ago

SELECT pickup_datetime, dropoff_datetime FROM bigquery-public-data.new_york.tlc_yellow_trips_20*

kev-ho commented 7 years ago

Go here https://cloud.google.com/bigquery/public-data/nyc-tlc-trips

and click on the "goto the new york city dataset" button

danfowler commented 7 years ago

"""

standardSQL

SELECT pickup_datetime, dropoff_datetime FROM `bigquery-public-data.new_york.tlc_yellow_trips_20*` """)

danfowler commented 7 years ago

@grandpotato @ccasimiro9444 here is the notebook: https://github.com/danfowler/bkkhack-stuff/blob/master/nyc-taxi-data.ipynb

ccasimiro9444 commented 7 years ago

Got it to run on datalab, just used Dan's code. screen shot 2017-05-26 at 00 15 49

danfowler commented 7 years ago

Excellent! 💯

By the way, I finally got datalab working on my Google account 😉

Maybe, if you haven't gotten around to it by then, we work on the visualization piece at the next bkkhack.

kev-ho commented 7 years ago

Nice!

Unfortunately I've had 0 success so far. So I'm going to just have to leave it here for now. :(

Next time I'll just clear everything and start from scratch and see if that helps.

ccasimiro9444 commented 7 years ago

Sounds good. Gotta get the whole dataset into a dataframe first, the direct import from Bigquery to dataframe takes too long. Maybe loading the table and then transforming will be faster. Let's try that and the visualization out at the next hack.