Closed djouallah closed 1 month ago
It works for me using the credential chain / CLI. Maybe it's something with the token auth?
EDIT: I'm also testing against a normal ADLS account, not Fabric. I'll test against Fabric later this evening.
I agree with @djouallah I cannot get it work with with delta_scan
, but seems like parquet_scan
works:
Simple example of trying to use the new token-based auth with Azure extension. This works exactly like this with a CONNECTION STRING, but using the new token-based auth I get an error, which I'm not sure how to figure out if it is a security policy on my side or something not working with the latest extension. I'm using Duckdb 1.1.0
credential = AzureMLOnBehalfOfCredential()
os.environ['AZURE_ACCESS_TOKEN'] = credential.get_token("https://storage.azure.com/").token
conn = duckdb.connect(':memory:')
conn.install_extension("delta")
conn.load_extension("delta")
conn.install_extension("azure")
conn.load_extension("azure")
conn.execute("""SET azure_transport_option_type = curl""")
conn.execute(f"""CREATE SECRET secret1 (
TYPE AZURE,
PROVIDER ACCESS_TOKEN,
ACCESS_TOKEN '{os.getenv('AZURE_ACCESS_TOKEN')}',
ACCOUNT_NAME '{os.getenv('AZURE_STORAGE_ACCOUNT')}'
);""")
full_path=f'abfss://{CONTAINER_NAME}/{delta_table_path}'
query = f"SELECT * FROM delta_scan('{full_path}') LIMIT 3"
conn.execute(query).fetchdf()
Error I get:
Traceback (most recent call last) Cell In [43], line 37 35 full_path=f'abfss://{CONTAINER_NAME}/{delta_table_path}' 36 query = f"SELECT * FROM delta_scan('{full_path}') LIMIT 3" ---> 37 conn.execute(query).fetchdf() IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://xxx'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Client error with status 401 Unauthorized: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
@djouallah, I'm also just checking: Does using the parquet_scan
still provide the correct output, even though it is a delta_table
? This can then at least be a workaround (I know you will probably lose some delta table functionality, but as long as it reads the latest version of the data, it is potentially a workaround for now).
@Tikavdm no, you should never ever use parquet scan to read a delta table, you will get wrong results if the table had delete or compact, optimize operation, I used that example only to show the issue with authentication
Confirmed i get the same problem using an ACCESS_TOKEN against my storage account, so it sounds like a problem with that feature.
Oh, that's right. This code needs to be updated to handle access tokens since that is a new feature for the azure extension.
still same issue, using deltalake extension 0.2
IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://sqlengines@onelake.dfs.fabric.microsoft.com/ETL.Lakehouse/Tables/T2/chdb/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: The operation lacked valid authentication credentials for path ETL.Lakehouse/Tables/T2/chdb/_delta_log/_last_checkpoint: Client error with status 401 Unauthorized: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidAuthenticationInfo</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:d2946d3c-601e-001f-7194-0407ae000000
Time:2024-09-11T21:48:26.8575298Z</Message><AuthenticationErrorDetail>Issuer validation failed. Issuer did not match.</AuthenticationErrorDetail></Error>)
@djouallah have you tried the nightly?
@gdubya no but i am using the latest delta extension which 0.2, it should have the fix right
Ah, yes, you're right (according to https://github.com/duckdb/duckdb_delta/releases/tag/v0.2.0). Weird. I'll test it again.
Is this a related issue #91? And if there is some workaround that applies to #91 it would be great
@hirsimaki-markus no, that's a different issue. This one is specifically related to access tokens, but you're using a service principal.
For the record mine works now!! Thanks @gdubya . So token-based authentication works with the latest duckdb 1.1.0 and the updated extension of delta.
@djouallah It looks like is now a Fabric issue, so you can reopen #43
I tested by copying a Delta table from Fabric to a regular ADLS account. ADLS works, Fabric does not.
azure secret works fine when using parquet but are not passed when using delta_scan
this works
this does not works ( although it give wrong results as it is not using the delta log to scan only relevant parquet files)