Closed mobley-trent closed 10 months ago
👍 same issue
@mobley-trent @gkaretka Thanks for reaching out here. The tables are not created by default, but I think that might be the wrong behaviour since you both expected them to be created automatically.
To get things working on your end, you can run:
catalog = load_catalog(
"demo",
**{
"type": "sql",
"uri": "postgresql+psycopg2://[username]:[passwd]@localhost/demo"
}
)
catalog.create_tables()
Thanks 👍 was actually trying to investigate a little further and came to the same conclusion. Creating tables by calling:
catalog.create_tables()
solved my problem.
Maybe this issue can be renamed into: documentation rather than bug. What do you think @mobley-trent ?
I have the same feeling, I think this should be done once, and not every time you initialize the catalog (if you run a lot of jobs in parallel using Airflow, these calls can add up). I think we can add it here to the docs as well.
Are you interested in providing a PR? :)
Yes I will look into this 👍 @Fokko
@Fokko What should we place in the location parameter of catalog.create_table(...)
? Looking at the code, we are supposed to specify a warehouse location but I'm not quite sure what this means in regards to Postgresql
This should be a location to a bucket where the metadata and the data itself will be stored.
I think this finds its origin in Hive: https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-cdh/topics/hive-khub-location-syntax.html but it similar relevant to the SQL Catalog.
Apache Iceberg version
0.5.0 (latest release)
Please describe the bug 🐞
Python = 3.11 PostgreSql = v16
I'm having issues setting up the initial connection to postgresql. I'm trying to run the code in the docs but I'm getting this error:
This is my code:
This is the
.pyiceberg.yaml
file:demo
is the name of the database I'm using, it has a table namedstudents
which I'm trying to access. I've attached the full stack trace for your review. stack_trace_iceberg.txt