Closed gdubya closed 2 months ago
Could you please consider supporting abfs://
in addition to abfss://
as well?
abfs://
is commonly used with the Hadoop ABFS support (the original implementation of ABFS). There is no difference between abfs://
and abfss://
as the Hadoop ABFS support enforces TLS by default.
@sugibuchi I'm happy to try adding that, but I would like to do it as a separate PR to limit the scope of what's affected here. The abfss
string is used quite a few places so it will be a bit of work to add support for abfs
as well, as simple as it sounds.
@samansmink Could you have a look at this PR and merge if all looks ok? Apologies if it's not you that I should ping. I just looked at the other open PR. Maybe there should be a CODEOWNERS file in the repo?
Hello, I had in the past allow the syntax that you are proposing here but remove it because how scope for secret work. The scope when defined on secret return the score with the path that the score is simply the common distance between path and scope. So if you want to define a secret for a all storage account with the container@sa you cannot anymore.
From my point a view I do not think that a good idea but maybe I a particular user that use too many storage account.
A solution for this can also be to change the way secrets score is compute and allow wildcard.
Regards Quentin
@quentingodeau I see what you mean. But given how important support for this syntax is, would it be ok to accept this change and document the limitations with regards to the scopes? This change should not affect existing functionality, only add some more functionality (which is slightly limited with regards to the scopes, but better than nothing, IMHO).
Yes all abfss path are in fact mostly manage this way. I was just to point out the limitations of the syntax. I really think that we should adapt the way score are compute in the secret management. I have not enough time at the moment but if it is not done before I will try to take a look. Also at the end I think we should only keep one syntax (but that only my opinion) Regards
@gdubya
I'm happy to try adding that, but I would like to do it as a separate PR
OK. I understand and agree to this.
Also at the end I think we should only keep one syntax (but that only my opinion)
If we should keep only one syntax, please consider keeping the Hadoop compatible syntax.
For historical reasons, each ABFS filesystem implementation supports a different set of ABFS URL syntax.
Apache Hadoop's hadoop-azure
package link
Python adlfs
for fsspec
link
Rust object_store::azure
link
Apache Arrow link
DuckDB azure
extension link
This inconsistency of supported ABFS URL syntax is a source of headaches, particularly when we combine different implementations of the ABFS file system (for example, extracting data from a massive table using Spark, then interactively analysing it with Polars, etc.).
Fortunately, the ABFS URL syntax defined by Hadoop's hadoop-azure
(the original ABFS implementation) can be a safe choice when we combine different ABFS implementations as this syntax is widely supported by almost all ABFS implementations, except the DuckDB Azure extension.
To maximize interoperability with other frameworks/libraries, the support of the Hadoop-compatible syntax should be a high priority.
@sugibuchi Thank you for the summary of the various options. It is quite a mess! I agree with the conclusion that the ABFS syntax should be the main priority.
Hopefully we can merge this PR now, and then any decisions made about futher changes can be handled in a different one?
all green 🟢
@sugibuchi your suggestion is now implemented
thanks @gdubya!
While playing with the uc_catalog and delta extensions I encountered an error while trying to read from an ADLS gen2 account using the abfss syntax in this format:
abfss://container@storage.endpoint/path
.This PR corrects the URL parser to account for this additional variation.
Linked issues: