apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
472 stars 174 forks source link

Why not use the profile name when initialising the S3FileSystem class? #1207

Open wudihero2 opened 1 month ago

wudihero2 commented 1 month ago

Question

Hi, In #922 standardize some AWS credential names, but I am confused why not use below code to use aws profile name at pyiceberg.io.fsspec.py?

profile_session = AioSession(profile="xxx")
fs = s3fs.S3FileSystem(session=profile_session)

It supports the use of AWS profile names like `glue.profile-name' as in the following code at pyiceberg.catalog.glue.py

profile_name=PropertyUtil.get_first_property_value(properties, GLUE_PROFILE_NAME, DEPRECATED_PROFILE_NAME),

Maybe it would be a better improvement if we could use profile name or is there some concern I haven't considered?

kevinjqliu commented 1 month ago

I think this is a feature gap on the S3 FileIO. It makes sense to support profile_name. We would need to support both fsspec and pyarrow

Is this something you would like to contribute?

wudihero2 commented 1 month ago

Hello, I am interested in this, do I need to tag the person who will assign this task to me?

kevinjqliu commented 1 month ago

@wudihero2 assigned to you :)

sungwy commented 1 month ago

Hi folks, I was under the impression that this was something that would need to be addressed in PyArrow S3FileSystem. Please see @HonahX 's earlier comment:

https://github.com/apache/iceberg-python/pull/922#discussion_r1677395543

Previous issue on same topic: https://github.com/apache/iceberg-python/issues/570 PyArrow S3FileSystem: https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html

On that note, I'm in favor of keeping this issue open since this is a frequent question from the community, until we are able to find a solution by perhaps working with the Arrow community.

kevinjqliu commented 1 month ago

https://github.com/apache/iceberg-python/issues/1104#issuecomment-2377397379 this thread is somewhat related

wudihero2 commented 1 month ago

Hi all, I checked the code of pyarrow and found that the profile_name parameter is not currently supported. The s3.* related parameters are indeed not suitable for supporting profile_name. It would be great if we could work with the Arrow community to find a solution! This package is great and can help me use iceberg for my current job, thank you !!