Closed neiblegy closed 9 months ago
my endpoint is "http://ceph-c105-sg-drt-aip.s3.sto.xxxx.io" , seems there are hard coding process "s3" in it, then treat something wrong as BUCKET
ClickHouse offers a lot of s3 function and s3 engine related settings which influence the driver and might apply to your case
It's due to some s3 implementation that not fully follow s3 specifications.
ClickHouse will use the string before s3
in domain as bucket name.
The regex is R"((.+)\.(s3|cos|obs|oss)([.\-][a-z0-9\-.:]+))"
But in this issue, bucket name is represented to the path after domain name.
Typically, I will not fix this. But what's wired is the offical awscli
could handle these misconfigured s3 storage with specify endpoint and s3 URL separately. like:
aws s3 ls s3://bucket/datasets/ryan_test/ --endpoint http://some-irrelevant-name.s3.xxx.io
I will check it later.
Won't fix, it's a mis-configured S3 issue.
i have 41 parquet files stored in S3, then i need execute sql with:
chdb.query(f"select ais_image_path from s3('http://ENDPOINT_URL/BUCKET/KEY_PREFIX/*.parquet', 'USER', 'PWD', Parquet) where ais_image_path = '{path}'", 'Dataframe')
then got error:
Code: 636. DB::Exception: Cannot extract table structure from Parquet format file, because there are no files with provided path in S3 or all files are empty. You must specify table structure manually: Cannot extract table structure from Parquet format file. You can specify the structure manually. (CANNOT_EXTRACT_TABLE_STRUCTURE)
i'm sure that all parquet-files is in the right path i given, and these file can be handled correctly if they are local files.
i change "Dataframe" to "Debug" then got traceback:
any string in backtrace like "xxxxxx" actually right strings.
code environment: x86_64 python3.9 chdb:1.0.0 s3: ceph-s3