BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

[DOC] How to create tables from a public GCS bucket? #1459

Open randerzander opened 3 years ago

randerzander commented 3 years ago

Following the Google Cloud Storage bucket docs, I see I need a GCP project id.

However, I'm trying to read the taxi dataset out of a public bucket gcs://anaconda-public-data/nyc-taxi/csv/.

If I use my own existing project ID, or a dummy project name:

bc.gs('anaconda-public-data', bucket_name='anaconda-public-data', project_id='test')

I get:

Google Cloud Storage Plugin Error: Couldn't create gcs::ClientOptions for Project ID test status=Could not automatically determine credentials. For more information, please see https://developers.google.com/identity/protocols/application-default-credentials

Reading the link from the error, I don't see anything about creating credentials for a public dataset. Has anyone tried to do this before?

aucahuasi commented 3 years ago

Hi @randerzander, thanks for using BlazingSQL! As far I know, you need to be authenticated with google, you can try this: gcloud auth application-default login Or if you prefer you can use a custom auth file with

bc.gs(authority,
        project_id = 'myproj',
        bucket_name = 'b1',
        use_default_adc_json_file = True,
        adc_json_file = '/path/to/your/auth_file.json')

You can see more details here https://cloud.google.com/docs/authentication/production If you have questions please let's know! cc @mario21ic @romulo-auccapuclla