Closed Michelangelo367 closed 3 years ago
Hey,
This should work:
from pathlib import Path
from google.cloud import bigquery
PROJECT = "dataverbinders-cbs-dl"
DATASET = "cbs_v3_83583NED"
TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("path_to_json_key")
client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)
query_text = f"""
SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000
"""
query = client.query(query_text)
for row in query:
print(row)
break
@Michelangelo367 Was this helpful? Is this issue still relevant?
is it works; great.
Do you have another python example script with pandas as well?
Yes, want to test some sql scripts in bigquery and see how it goes in python with pandas. And later do some analysis / reporting with ploty/dash in combination with the panda results.
Sounds great :) Looking forward to seeing how that turns out.
To be clear - can this issue be closed, or do you need any further support here?
Yes, I still need some help. Do you have a python script with pandas, select query and connection to bigquery?
If you use the script I've written above, and change the ending to the example from the link I've supplied, does that work?
That would mean the executing the following:
from google.cloud import bigquery
PROJECT = "dataverbinders-cbs-dl"
DATASET = "cbs_v3_83583NED"
TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("path_to_json_key")
client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)
query_text = f"""
SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000
"""
df = client.query(query_text).to_dataframe()
I still get an error.
Should I use package google-cloud-bigquery-storage in python script as well?
You need to install the bigquery-storage-client which is a separate client.
On Tue, 4 May 2021, 16:28 Michelangelo367, @.***> wrote:
I still get an error.
[image: image] https://user-images.githubusercontent.com/45150472/117018973-4a648a00-acf5-11eb-838b-d3e80895684c.png
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dataverbinders/nl-open-data/issues/95#issuecomment-831985346, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7GYDUFJWQQLKYTEUJLDTTTMAAADANCNFSM436NFNNQ .
I already installed the google-cloud-bigquery-storage package and some other google packages as you can see below.
You need to install the bigquery-storage-client which is a separate client.
I also don't have it installed, and I get the same warning
, but no error
- and the df
loads.
@Michelangelo367 - which pyarrow
and pandas
versions are you using?
Could you try it with pandas 1.2.4
and pyarrow 3.0.0
, and let me know if the issue persists?
I use different versions. I am going to update them.
Thanks. My money is on Pyarrow
I still get an error message.
from pathlib import Path import pandas as pd from google.cloud import bigquery
PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")
client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)
query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """
df = client.query(query_text).to_dataframe()
just installed package protobuf, but still get the same error as above printscreen.
I have as well libprotobuf next to protolib. I am going to update them, and see if it works out.
@dkapitan @Michelangelo367 This might be a good time to talk about dependencies and how we manage them. Let's add to the the agenda for when all three of us get together.
Yes, that good to discuss this. I now see that I get failed installing packages. So, I cannot update the protobuf to version 3.14; conflict with pyarrow=3.0 among others
Could you try downgrading pyarrow?
I often find backtracking works best: start a clean Python 3.8 environment with conda.
I have downgraded to pyarrow 2.0.; I still get the same error. I am going to restart. I am using a Python 3.7 environment with conda. Sometimes I install mostly conda, and then time conda-force. And then finally pip.
Normally, I start a clean environment for specific projects, but then I install other packages, because I want to do more things in that environment.
step: 1) and installed only conda-forge google-cloud-bigquery. C:\Users\arent\Anaconda3\envs\BigQuery_P_38>conda install -c conda-forge google-cloud-bigquery
The first script bigquery works again:
from pathlib import Path
from google.cloud import bigquery
PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")
client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)
query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """
query = client.query(query_text)
for row in query: print(row) break
The second script with pandas, does not give any error, but no result either:
from pathlib import Path import pandas as pd from google.cloud import bigquery
PROJECT = "dataverbinders-cbs-dl" DATASET = "cbs_v3_83583NED" TABLE = "83583NED_TypedDataSet"
JSON_KEY = Path("P:/Programming2/5_P_DeDataverbinders/GoogleCloud/JSON/dataverbinders-michel-dwh-bafc7265fde8.json")
client = bigquery.Client(project=PROJECT).from_service_account_json(JSON_KEY)
query_text = f""" SELECT * FROM {PROJECT}.{DATASET}.{TABLE} LIMIT 1000 """
df = client.query(query_text).to_dataframe()
step: 2)installed pyarrow package 3.0 (this package installed other packages like pandas as well) I see that pyarrow has version 4.0 in the conda list!!! Unfortunately, I still don't see any output. Perhaps has something to do with the second script with pandas?
I noticed that the conda list in environment BigQuery_P_38 contains the following requirements: protobuf=3.13.0=pypi_0 libprotobuf=3.15.8=h7755175_0
pyarrow=4.0.0=py38hd38ce15_1_cpu
pandas=1.2.4=py38hd77b12b_0 parquet-cpp=1.5.1=2
google-api-core=1.22.4=pypi_0 google-api-core-grpc=1.26.3=hd8ed1ab_0 google-auth=1.22.1=pypi_0 google-cloud-bigquery=2.15.0=pyheb06c22_0 google-cloud-bigquery-core=2.15.0=pyheb06c22_0 google-cloud-bigquery-storage=2.0.0=pypi_0 google-cloud-bigquery-storage-core=2.0.0=pyh9f0ad1d_1 google-cloud-core=1.5.0=pyhd3deb0d_0 google-crc32c=1.1.2=py38h554a69a_0 google-resumable-media=1.2.0=pyhd3deb0d_0 googleapis-common-protos=1.52.0=pypi_0
@Michelangelo367 No error is a good sign.
You need to add a print(df)
at the end of the script to actually see the result - as it stands, running the script shows nothing because there is no output to stdout
- it only stores the result into as pandas dataframe
into the variable df
- but does nothing with it.
it works!!!!, Normally I write a print statement, but now I forgot to write the print statement.
Great :)
On Wed, May 5, 2021, 00:13 Michelangelo367 @.***> wrote:
it works!!!!, Normally I write a print statement, but now I forgot to write the print statement.
[image: image] https://user-images.githubusercontent.com/45150472/117076317-7e13d400-ad36-11eb-9097-712999f929fc.png
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dataverbinders/nl-open-data/issues/95#issuecomment-832284013, or unsubscribe https://github.com/notifications/unsubscribe-auth/AICEW6EOPWSQ32VCJZXY7W3TMBWPNANCNFSM436NFNNQ .
Unfortunately, I get an error, while connecting python with bigquery. Do I need to change the pythonscript or is there an other problem? Do you have perhaps a python script example how to connect to bigquery and select a table and/or load a table into bigquery? So, I can check the connection (python-bigquery) more quickly.
I installed following packages in Anaconda Python:
I have setup the google_application_credentials in windows environmental variables - env - system variable.
I used three python scripts (youtube videos) for testing the connection and select * from table or upload a table with(out) pandas; see printscreeens error:
1)error_pythonscript_access bigquery from a service account (1-2), "metadata service. Compute Engine Metadata server unavailable".format(url) google.auth.exceptions.TransportError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/dv-michel-py-214@dataverbinders-michel-dwh.iam.gserviceaccount.com/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable
File "", line 3, in raise_from
google.auth.exceptions.RefreshError: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/dv-michel-py-214@dataverbinders-michel-dwh.iam.gserviceaccount.com/?recursive=true from the Google Compute Enginemetadata service. Compute Engine Metadata server unavailable
error_pythonscript_access bigquery from a service account (2-2) access_bigquery_from_a_service_account.txt
2)error_pythonscript_dataframe bigquery vytautas bielinskas "Invalid Table Name. Should be of the form 'datasetId.tableId' or " pandas_gbq.gbq.NotFoundException: Invalid Table Name. Should be of the form 'datasetId.tableId' or 'projectId.datasetId.tableId'
pandas_dataframe_bigquery_vytautas_bielinskas.txt
3)error_pythonscript_pandas_and big query "Cannot create BigQuery Storage client, the dependency " File "pyarrow\table.pxi", line 956, in pyarrow.lib.RecordBatch.from_arrays TypeError: from_arrays() takes at least 2 positional arguments (1 given)
pandas_and_big_query.txt