bitsondatadev / trino-getting-started

Apache License 2.0
234 stars 97 forks source link

Add instructions to use your thrino instance with Python #25

Closed KOLANICH closed 8 months ago

KOLANICH commented 1 year ago

I have tried

from trino.dbapi import connect

from pathlib import Path
schemaFile = Path("drivestats.sql")

conn = connect(host="s3.us-west-004.backblazeb2.com", port=8080, user=None, catalog="b2", schema=str(schemaFile))
cur = conn.cursor()
res = cur.execute("SELECT COUNT(*) FROM drivestats  WHERE year = 2023 AND month = 3 AND day = 29;") # HTTPConnectionPool(host='s3.us-west-004.backblazeb2.com', port=8080): Max retries exceeded with url: /v1/statement (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fba0bbcac50>, 'Connection to s3.us-west-004.backblazeb2.com timed out. (connect timeout=30.0)'))

I guess it can be because I haven't filled the credentials:

BUCKET_NAME: drivestats-parquet
APPLICATION_KEY: K004cogT4GIeHHfhCyPPLsPBT4NyY1A
KEY_ID: 0045f0571db506a0000000007
ENDPOINT: s3.us-west-004.backblazeb2.com
--schema ds

Where should I put them?

bitsondatadev commented 8 months ago

Hey @KOLANICH sorry for the late reply. @metadaddy has worked with the backblaze tutorial. Pat, is there anything extra a python connection would need for B2?

metadaddy commented 8 months ago

Hi @KOLANICH - we're not running a Trino server at Backblaze - you have to run Trino yourself in Docker, following the instructions in the README. You use those credentials in this step, instead of creating your own bucket and application key.

If you wanted to access the data directly from Python, without running your own Trino instance, you could use DuckDB like this:

import duckdb

cursor = duckdb.connect()
cursor.execute("INSTALL httpfs");
cursor.execute("LOAD httpfs");
cursor.execute("SET s3_endpoint='s3.us-west-004.backblazeb2.com'");
cursor.execute("SET s3_region='us-west-004'");
cursor.execute("SET s3_access_key_id='0045f0571db506a0000000007'");
cursor.execute("SET s3_secret_access_key='K004cogT4GIeHHfhCyPPLsPBT4NyY1A'");
results = cursor.execute("SELECT COUNT(*) FROM 's3://drivestats-parquet/drivestats/*/*/*.parquet' WHERE year=2023 AND month=3 AND day=29").fetchall();
print(results)
bitsondatadev commented 8 months ago

@KOLANICH let me know if this solved your issue. Feel free to reopen if it doesn't.