dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.02k stars 312 forks source link

Support ADL Gen 1? #534

Closed chtsou closed 4 years ago

chtsou commented 4 years ago

I'm trying to read data from ADL Gen 1. I've tried to set the configuration in SparkConf: _sparkConf.Set("fs.adl.oauth2.access.token.provider.type", "ClientCredential"); sparkConf.Set("fs.adl.account.{my_ADLS_account}.oauth2.refresh.url", "https://login.microsoftonline.com/{my_directory_id}/oauth2/token"); sparkConf.Set("fs.adl.account.{my_ADLS_account}.oauth2.client.id", "{my_client_id}"); sparkConf.Set("fs.adl.account.{my_ADLS_account}.oauth2.credential", "{mysecret}"); It didn't work. Do you know to access ADL Gen 1 or dotnetSpark didn't support this?

Niharikadutta commented 4 years ago

Can you paste the exact error you got?

chtsou commented 4 years ago

Sure, sorry for that. Here is the error I got:

20/06/04 02:03:03 WARN [task-result-getter-0] scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, CO4AAP9C222E737, executor 1): org.apache.hadoop.security.AccessControlException: OPEN failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [9c8a0a46-6335-4680-9d27-f52f56b65ac9] failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [9c8a0a46-6335-4680-9d27-f52f56b65ac9][2020-06-04T02:03:03.8818296-07:00] [ServerRequestId:9c8a0a46-6335-4680-9d27-f52f56b65ac9]

Niharikadutta commented 4 years ago

This looks like an access issue, are the credentials and url correct? Can you try accessing it using these credentials outside of spark? Also make sure the user application has the correct read/write permissions on the account, please refer to this guide for ways to authenticate your application.

chtsou commented 4 years ago

I tried with Hadoop in my local machine by setting the core-site.xml with the same credential. Here is the command and result:

D:\hadoop-3.2.1>bin\hdfs dfs -ls adl://bingads-algo-adinsights-c08.azuredatalakestore.net/local/OSPublish/chtsou_test/cooltest.tsv 2020-06-05 09:47:52,232 INFO adl.AdlFileSystem: No valid ADL SDK timeout configured: using SDK default. -rwxrwx---+ 1 43f107ad-d770-48c5-b670-8bcd8d7b898a 43f107ad-d770-48c5-b670-8bcd8d7b898a 26894 2020-06-04 09:47 adl://bingads-algo-adinsights-c08.azuredatalakestore.net/local/OSPublish/chtsou_test/cooltest.tsv

I suppose that it indicates that this credential could access ADLS successfully?

Niharikadutta commented 4 years ago

How are you running this Spark application? Can you refer to https://github.com/dotnet/spark/issues/337 to see if that helps?

chtsou commented 4 years ago

I run the Spark application via Livy endpoint, to an internal cluster. I found some warnings prior to the error: 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for fs.adl.oauth2.client.id, D:\data\ASG_MTP\APSecretStoreCredentials/fs.adl.oauth2.client.id.encr file does not exist. 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for dfs.adls.oauth2.client.id, D:\data\ASG_MTP\APSecretStoreCredentials/dfs.adls.oauth2.client.id.encr file does not exist. 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for fs.adl.oauth2.refresh.url, D:\data\ASG_MTP\APSecretStoreCredentials/fs.adl.oauth2.refresh.url.encr file does not exist. 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for dfs.adls.oauth2.refresh.url, D:\data\ASG_MTP\APSecretStoreCredentials/dfs.adls.oauth2.refresh.url.encr file does not exist. 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for fs.adl.oauth2.credential, D:\data\ASG_MTP\APSecretStoreCredentials/fs.adl.oauth2.credential.encr file does not exist. 20/06/04 02:02:42 WARN [nioEventLoopGroup-2-2] alias.APSecretStoreProvider: Cannot get credential for dfs.adls.oauth2.credential, D:\data\ASG_MTP\APSecretStoreCredentials/dfs.adls.oauth2.credential.encr file does not exist.

Does it indicate that the program failed to find the secret from CredentialProviders? Will this failure block the program to use the settings in SparkConf?

chtsou commented 4 years ago

Sorry, it was the wrong settings of ACL caused this issue. I've tested the permissions with AdlsClient in C# and found out that the SPI could access the folder, but not the files inside. Then I contacted the ADLS account owner and fixed this issue. Everything works now. Great thanks for pointing out that!

Niharikadutta commented 4 years ago

Glad to hear it worked, thanks @chtsou !