Closed zeroshade closed 8 months ago
CC @Fokko @wolfeidau @nastra @HonahX @jackye1995
I think what's currently missing is having a way to configure the warehouse
(which I hardcoded for testing) but also handling the signing part of requests against S3, similar to https://github.com/apache/iceberg-python/blob/f66e3652fdf9720d6c63a6fcec7bcd08d5bb186c/pyiceberg/io/fsspec.py#L70-L95
Listing files via go run ./cmd/iceberg files iceberg124.foobar --catalog rest --uri https://api.dev.tabular.io/ws/ --credential <creds>
will fail with
2024/02/13 10:24:07 could not open manifest file: operation error S3: GetObject, https response error StatusCode: 403, RequestID: 066G7WZD23KHZCBJ, HostID: d4V0iCd2uzvp9gZJWDOWmljaREgSaL9Iro0XxOFsv38ECJpdCd/JHWG8Y6/i7oSal8cONZ87Tis=, api error AccessDenied: Access Denied
exit status 1
I believe this is because FileIO
isn't configured with the TOKEN
in the authorization header that's coming back from the config
inside tblResponse
here. Reading all other metadata of tables work via CLI, but this is because those never use FileIO
and only files
does that atm.
@nastra
Hmm. So, setting the env vars AWS_REGION
, AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
should all work and get picked up by the FileIO. But I haven't tried testing with https://api.dev.tabular.io/ws/
before.
There is the ability to set a session token via the s3.session-token
property but you're right that I don't think it gets propagated. Is there any special configuration I need to set up in order to try testing out the api.dev.tabular.io/ws/ uri myself?
@nastra So I've figured out the issue:
The properties are correctly being propagated to the FileIO object, however it looks like the tabular api doesn't like the Go Iceberg user-agent.
I loaded up pyiceberg
to see what it does differently and how it works, and saw that the request for the table included in its response a series of s3 properties including an access-key-id, session-token, and secret-access-key in the config. When I looked at the same request from the Go cli those properties weren't there. If I hardcode and change the User-Agent that the Go CLI passes to be PyIceberg/0.5.1
suddenly those properties are returned and loading the manifests works just fine. So the problem is definitely the fact that the User-Agent isn't recognized by the tabular rest catalog enough for it to send the s3 key properties.
Anything we can do on the tabular side?
During RestCatalog.LoadTable
@nastra Added several issues as suggested
Adding an initial implementation and unit tests for the Rest catalog.