Open muniatl opened 4 months ago
@muniatl - I think the MinIO endpoints should not use the s3:// prefix for the endpoint configuration. They should instead use the HTTP/HTTPS protocol. e.g: warehouse="s3://iceberg", # Correct S3 URI format without the endpoint s3_endpoint="http://127.0.0.1:9000", # Corrected MinIO endpoint
Could you please try this?
I tried something similar with my local config:
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.catalog import load_catalog
warehouse_path = "local_s3"
catalog = SqlCatalog(
"catalog_1",
**{
"uri": f"sqlite:///{warehouse_path}/catalog.db",
"warehouse":"s3://iceberg",
"s3.endpoint": "http://localhost:9001",
"s3.access-key-id": "minio_user",
"s3.secret-access-key": "minio1234",
},
)
catalog.create_namespace_if_not_exists('test')
And then , the creation of the table raise one error.
# Define Schema for Projects Table
projects_schema = pa.schema([
pa.field('id', pa.uint8(), nullable=False),
pa.field('name', pa.string(), nullable=False),
pa.field('description', pa.string()),
pa.field('creation_date', pa.timestamp('s')),
pa.field('modification_date', pa.timestamp('s'))
])
projects_table = catalog.create_table_if_not_exists(
'test.projects',
schema=projects_schema,
)
The error:
OSError: When getting information for key 'test.db/projects/metadata/00000-5a3bb77f-7161-4bfe-a7af-b823f6f0cb71.metadata.json' in bucket 'iceberg': AWS Error UNKNOWN (HTTP status 400) during HeadObject operation: No response body.
Query engine
No response
Question
I have a piece of code which is working with S3 endpoint and a Sql Catalog with sqlite. However for testing, I want to be able to run it against a minio deployment that's hosted and running on localhost. I have tried various options with no luck. What are the parameters I need to pass to SqlCatalog and create_table? My code looks like this: catalog = SqlCatalog( "default", **{ "uri": f"sqlite:///{warehouse_path}/pyiceberg_catalog.db",
"uri" : f"postgresql+psycopg2://postgres:ph1@localhost:5433/template1",
}, )
table = catalog.create_table( "default1.taxi_dataset", schema=df.schema, ) _OSError: When getting information for key 'iceberg/default1.db/taxi_dataset/metadata/00000-671ce9cf-73ff-49a2-a22e-408d8758625b.metadata.json' in bucket '127.0.0.1:9000': AWS Error NETWORKCONNECTION during HeadObject operation: curlCode: 6, Couldn't resolve host name.
I am able to access minio server, login and able to even upload files. Any pointers on what are the valid properties to pass for minio much appreciated