dataverbinders / nl-open-data

A Flexible Python ETL toolkit for datawarehousing framework based on Dask, Prefect and the pydata stack
https://dkapitan.github.io/nl-open-data
MIT License
0 stars 1 forks source link

Get error while connecting python with bigquery #95

Closed Michelangelo367 closed 3 years ago

Michelangelo367 commented 3 years ago

Unfortunately, I get an error, while connecting python with bigquery. Do I need to change the pythonscript or is there an other problem? Do you have perhaps a python script example how to connect to bigquery and select a table and/or load a table into bigquery? So, I can check the connection (python-bigquery) more quickly.

I installed following packages in Anaconda Python: image image

I have setup the google_application_credentials in windows environmental variables - env - system variable. image

I used three python scripts (youtube videos) for testing the connection and select * from table or upload a table with(out) pandas; see printscreeens error:

1)error_pythonscript_access bigquery from a service account (1-2), "metadata service. Compute Engine Metadata server unavailable".format(url) google.auth.exceptions.TransportError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/dv-michel-py-214@dataverbinders-michel-dwh.iam.gserviceaccount.com/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable

File "", line 3, in raise_from google.auth.exceptions.RefreshError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/dv-michel-py-214@dataverbinders-michel-dwh.iam.gserviceaccount.com/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable

image error_pythonscript_access bigquery from a service account (2-2) image access_bigquery_from_a_service_account.txt

2)error_pythonscript_dataframe bigquery vytautas bielinskas "Invalid Table Name. Should be of the form 'datasetId.tableId' or " pandas_gbq.gbq.NotFoundException: Invalid Table Name. Should be of the form 'datasetId.tableId' or 'projectId.datasetId.tableId'

image pandas_dataframe_bigquery_vytautas_bielinskas.txt

3)error_pythonscript_pandas_and big query "Cannot create BigQuery Storage client, the dependency " File "pyarrow\table.pxi", line 956, in pyarrow.lib.RecordBatch.from_arrays TypeError: from_arrays() takes at least 2 positional arguments (1 given)

image pandas_and_big_query.txt

galamit86 commented 3 years ago

Hey,

This should work:

from pathlib import Path

from google.cloud import bigquery

PROJECT = "dataverbinders-cbs-dl"
DATASET = "cbs_v3_83583NED"
TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("path_to_json_key")

client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)

query_text = f"""
    SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000
"""

query = client.query(query_text)

for row in query:
    print(row)
    break
galamit86 commented 3 years ago

@Michelangelo367 Was this helpful? Is this issue still relevant?

Michelangelo367 commented 3 years ago

is it works; great.
Do you have another python example script with pandas as well?

image

galamit86 commented 3 years ago

this seems relevant - let me know if that solves your problem.

Michelangelo367 commented 3 years ago

Yes, want to test some sql scripts in bigquery and see how it goes in python with pandas. And later do some analysis / reporting with ploty/dash in combination with the panda results.

galamit86 commented 3 years ago

Sounds great :) Looking forward to seeing how that turns out.

To be clear - can this issue be closed, or do you need any further support here?

Michelangelo367 commented 3 years ago

Yes, I still need some help. Do you have a python script with pandas, select query and connection to bigquery?

galamit86 commented 3 years ago

If you use the script I've written above, and change the ending to the example from the link I've supplied, does that work?

That would mean the executing the following:

from google.cloud import bigquery

PROJECT = "dataverbinders-cbs-dl"
DATASET = "cbs_v3_83583NED"
TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("path_to_json_key")

client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)

query_text = f"""
    SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000
"""

df = client.query(query_text).to_dataframe()
Michelangelo367 commented 3 years ago

I still get an error.

image

Should I use package google-cloud-bigquery-storage in python script as well? image

dkapitan commented 3 years ago

You need to install the bigquery-storage-client which is a separate client.

On Tue, 4 May 2021, 16:28 Michelangelo367, @.***> wrote:

I still get an error.

[image: image] https://user-images.githubusercontent.com/45150472/117018973-4a648a00-acf5-11eb-838b-d3e80895684c.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dataverbinders/nl-open-data/issues/95#issuecomment-831985346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GYDUFJWQQLKYTEUJLDTTTMAAADANCNFSM436NFNNQ .

Michelangelo367 commented 3 years ago

I already installed the google-cloud-bigquery-storage package and some other google packages as you can see below. image

galamit86 commented 3 years ago

You need to install the bigquery-storage-client which is a separate client.

I also don't have it installed, and I get the same warning, but no error - and the df loads.

galamit86 commented 3 years ago

@Michelangelo367 - which pyarrow and pandas versions are you using?

Could you try it with pandas 1.2.4 and pyarrow 3.0.0, and let me know if the issue persists?

Michelangelo367 commented 3 years ago

I use different versions. I am going to update them.

image

image

galamit86 commented 3 years ago

Thanks. My money is on Pyarrow

Michelangelo367 commented 3 years ago

I still get an error message.

from pathlib import Path import pandas as pd from google.cloud import bigquery

PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"

JSON_KEY = Path("path_to_json_key")

JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")

client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)

query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """

df = client.query(query_text).to_dataframe()

image image

Michelangelo367 commented 3 years ago

just installed package protobuf, but still get the same error as above printscreen.

image

Michelangelo367 commented 3 years ago

I have as well libprotobuf next to protolib. I am going to update them, and see if it works out.

image

galamit86 commented 3 years ago

@dkapitan @Michelangelo367 This might be a good time to talk about dependencies and how we manage them. Let's add to the the agenda for when all three of us get together.

Michelangelo367 commented 3 years ago

Yes, that good to discuss this. I now see that I get failed installing packages. So, I cannot update the protobuf to version 3.14; conflict with pyarrow=3.0 among others

image

dkapitan commented 3 years ago

Could you try downgrading pyarrow?

I often find backtracking works best: start a clean Python 3.8 environment with conda.

Michelangelo367 commented 3 years ago

I have downgraded to pyarrow 2.0.; I still get the same error. I am going to restart. I am using a Python 3.7 environment with conda. Sometimes I install mostly conda, and then time conda-force. And then finally pip.

Normally, I start a clean environment for specific projects, but then I install other packages, because I want to do more things in that environment.

Michelangelo367 commented 3 years ago

I opened a new environment (BigQuery_P_38),

step: 1) and installed only conda-forge google-cloud-bigquery. C:\Users\arent\Anaconda3\envs\BigQuery_P_38>conda install -c conda-forge google-cloud-bigquery


The first script bigquery works again:

from pathlib import Path

from google.cloud import bigquery

PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"

JSON_KEY = Path("path_to_json_key")

JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")

client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)

query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """

query = client.query(query_text)

for row in query: print(row) break

image


The second script with pandas, does not give any error, but no result either:

from pathlib import Path import pandas as pd from google.cloud import bigquery

PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"

JSON_KEY = Path("path_to_json_key")

JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")

client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)

query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """

df = client.query(query_text).to_dataframe()

image


step: 2)installed pyarrow package 3.0 (this package installed other packages like pandas as well) I see that pyarrow has version 4.0 in the conda list!!! Unfortunately, I still don't see any output. Perhaps has something to do with the second script with pandas?

I noticed that the conda list in environment BigQuery_P_38 contains the following requirements: protobuf=3.13.0=pypi_0 libprotobuf=3.15.8=h7755175_0

pyarrow=4.0.0=py38hd38ce15_1_cpu

pandas=1.2.4=py38hd77b12b_0 parquet-cpp=1.5.1=2

google-api-core=1.22.4=pypi_0 google-api-core-grpc=1.26.3=hd8ed1ab_0 google-auth=1.22.1=pypi_0 google-cloud-bigquery=2.15.0=pyheb06c22_0 google-cloud-bigquery-core=2.15.0=pyheb06c22_0 google-cloud-bigquery-storage=2.0.0=pypi_0 google-cloud-bigquery-storage-core=2.0.0=pyh9f0ad1d_1 google-cloud-core=1.5.0=pyhd3deb0d_0 google-crc32c=1.1.2=py38h554a69a_0 google-resumable-media=1.2.0=pyhd3deb0d_0 googleapis-common-protos=1.52.0=pypi_0

requirements_20210504_2.txt

galamit86 commented 3 years ago

@Michelangelo367 No error is a good sign.

You need to add a print(df) at the end of the script to actually see the result - as it stands, running the script shows nothing because there is no output to stdout - it only stores the result into as pandas dataframe into the variable df - but does nothing with it.

Michelangelo367 commented 3 years ago

it works!!!!, Normally I write a print statement, but now I forgot to write the print statement.

image

galamit86 commented 3 years ago

Great :)

On Wed, May 5, 2021, 00:13 Michelangelo367 @.***> wrote:

it works!!!!, Normally I write a print statement, but now I forgot to write the print statement.

[image: image] https://user-images.githubusercontent.com/45150472/117076317-7e13d400-ad36-11eb-9097-712999f929fc.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dataverbinders/nl-open-data/issues/95#issuecomment-832284013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICEW6EOPWSQ32VCJZXY7W3TMBWPNANCNFSM436NFNNQ .