Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.81k stars 2.93k forks source link

Azure Blob/ADLS Gen 2 and abfs - add instructions on using SAS token #17307

Open juwalter opened 1 year ago

juwalter commented 1 year ago

Page

https://docs.alluxio.io/os/user/stable/en/ufs/Azure-Blob-Store.html https://docs.alluxio.io/os/user/stable/en/ufs/Azure-Data-Lake-Gen2.html

maybe also: https://docs.alluxio.io/os/user/stable/en/ufs/Azure-Data-Lake.html

Summary

Above page describes how to configure Azure Blob and Data Lake Gen2 as "Storage Integration"; they include instructions for using "Shared Key", "OAuth 2.0 Client Credentials", and "Azure Managed Identities" for authentication. However, "SAS token" is missing on both, while - in theory - this should be possible according to:

I have tried like so:

bin/alluxio fs mount \
--option fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net=SAS \
--option fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider \
--option fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net="my sas token" \
  /mnt/abfs abfs://<my container>@<storage-account>.dfs.core.windows.net/<my path>/

and also (blob.core.windows.net instead of dfs.core.windows.net and wasb instead of abfs)

bin/alluxio fs mount \
--option fs.azure.account.auth.type.<storage-account>.blob.core.windows.net=SAS \
--option fs.azure.sas.token.provider.type.<storage-account>.blob.core.windows.net=org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider \
--option fs.azure.sas.fixed.token.<storage-account>.blob.core.windows.net="my sas token" \
  /mnt/abfs wasb://<my container>@<storage-account>.blob.core.windows.net/<my path>/

blob.core.windows.net/wasb result in error message after running into a timeout:

no route to host http://169.254.169.254/metadata/... 

this looks like it is falling back to MSI (managed identity) authentication (since 169.254.169.254 only responds from inside a VM running in Azure)

dfs.core.windows.net and abfs result in a non-descript, general error

Failed get FileSystem for abfs:

I wonder if this can be done?

Jackson-Wang-7 commented 1 year ago

I think we may not support this way right now. now we only support ClientCredsTokenProvider and MsiTokenProvider for abfs.

juwalter commented 1 year ago

@Jackson-Wang-7 thank you for the quick feedback

Jackson-Wang-7 commented 1 year ago

@juwalter If you are interested, you can contribute relevant code implementations. It doesn't seem too difficult.

juwalter commented 1 year ago

@Jackson-Wang-7 - yes, we are looking into it!

LuQQiu commented 1 year ago

juwalter this is a question better suitable for alluxio.io/slack #troubleshooting channel. Greg (Email: greg.palmer@alluxio.com @gregpalmr) has more information on Azure SAS token, feel free to contact him directly.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

iRevive commented 1 year ago

/assign @iRevive

juwalter commented 1 year ago

/assign @juwalter

juwalter commented 1 year ago

I have created a PR for this at https://github.com/Alluxio/alluxio/pull/17583