delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.34k stars 413 forks source link

CaUsedAsEndEntity error in Microsoft Fabric #2449

Closed martroben closed 3 months ago

martroben commented 7 months ago

Environment

Delta-rs version: Python deltalake-0.17.1 Cloud provider: Microsoft (North Europe) Environment: Microsoft Fabric Notebook OS: Fabric VM (CBL-Mariner Linux)


Bug

What happened: While trying to create a Delta Table from a path in a Fabric Notebook, I'm getting the following error: OSError: Generic MicrosoftAzure error: Error after 10 retries in 1.992440924s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://onelake.blob.fabric.microsoft.com/<workspace id>/<lakehouse id>/Tables/some_table/_delta_log/_last_checkpoint): error trying to connect: invalid peer certificate: Other(CaUsedAsEndEntity)

What you expected to happen: To get a deltalake.DeltaTable instance without any errors.

How to reproduce it: Run the following code in a Fabric Notebook:

import deltalake
import trident_token_library_wrapper

workspace_id = "xxxxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxxxxxxx"
lakehouse_id = "yyyyyyyy-yyyyy-yyyyy-yyyyy-yyyyyyyyyyyyy"
path = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/some_table"

storage_options = {}
storage_options["bearer_token"] = trident_token_library_wrapper.PyTridentTokenLibrary.get_access_token("storage")
storage_options["use_fabric_endpoint"] = "true"
# storage_options["allow_invalid_certificates"] = "true"

deltalake.DeltaTable(
    table_uri=path,
    storage_options=storage_options)

More details: storage_options["allow_invalid_certificates"] = "true" can be used as a quickfix.

Here are the certificate details fetched by openssl s_client -showcerts -connect onelake.blob.fabric.microsoft.com:443 in the Fabric Notebook:

CONNECTED(00000003)
depth=0 C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = me@microsoft.com, CN = microsoft.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = me@microsoft.com, CN = microsoft.com
   i:C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = me@microsoft.com, CN = microsoft.com
-----BEGIN CERTIFICATE-----
MIIFKTCCBBGgAwIBAgIUSdOq2Tj7VfjrzloBGTEED3YeNGMwDQYJKoZIhvcNAQEL
BQAwgZ8xCzAJBgNVBAYTAlVTMRMwEQYDVQQIDApXYXNoaW5ndG9uMRAwDgYDVQQH
DAdSZWRtb25kMRYwFAYDVQQKDA1NaWNyb3NvZnREYXRhMRgwFgYDVQQLDA9TcGFy
a0RlcGFydG1lbnQxHzAdBgkqhkiG9w0BCQEWEG1lQG1pY3Jvc29mdC5jb20xFjAU
BgNVBAMMDW1pY3Jvc29mdC5jb20wHhcNMjQwNDI0MDk0NzAyWhcNMjUwNDI0MDk0
NzAyWjCBnzELMAkGA1UEBhMCVVMxEzARBgNVBAgMCldhc2hpbmd0b24xEDAOBgNV
BAcMB1JlZG1vbmQxFjAUBgNVBAoMDU1pY3Jvc29mdERhdGExGDAWBgNVBAsMD1Nw
YXJrRGVwYXJ0bWVudDEfMB0GCSqGSIb3DQEJARYQbWVAbWljcm9zb2Z0LmNvbTEW
MBQGA1UEAwwNbWljcm9zb2Z0LmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAL+l4Lto000/J9DEfqsuLZT48qh2K8gwQLOJvGu01LP+MqNm8QlT8K4r
hP6nShOoMTfMAEISbU9s+kN2/IjIl2fLGyHK+tB+NgMCo0mfdNyYmN/3oWfc4I1r
0sE+MfdhuC9VeayCyWTRR/O36PaggvmrAL45QQjqAUBgs0yZBnNtIRLy4QNm4ymS
yUvBzhJAyBmxuW1uuDo9SgoRk3EetxaUkObOT3fRyqoTKTU06Kpee8IK5CH4mhmr
ny/yVLHuaup13ZwQdmPJXZou2wIxa5fYqjeG46dVRT07IECl6KD/zoK+M227F0Ij
KQB2q5NlhgnkTxPpP0dJ54ophXkp6isCAwEAAaOCAVkwggFVMAwGA1UdEwQFMAMB
Af8wggFDBgNVHREEggE6MIIBNoIJbG9jYWxob3N0gh4qLnBiaWRlZGljYXRlZC53
aW5kb3dzLWludC5uZXSCIiouZGZzLnBiaWRlZGljYXRlZC53aW5kb3dzLWludC5u
ZXSCIyouYmxvYi5wYmlkZWRpY2F0ZWQud2luZG93cy1pbnQubmV0ghoqLmRmcy5m
YWJyaWMubWljcm9zb2Z0LmNvbYIbKi5ibG9iLmZhYnJpYy5taWNyb3NvZnQuY29t
gh4qLm9uZWxha2UuZmFicmljLm1pY3Jvc29mdC5jb22CGioucGJpZGVkaWNhdGVk
LndpbmRvd3MubmV0gh4qLmRmcy5wYmlkZWRpY2F0ZWQud2luZG93cy5uZXSCHyou
YmxvYi5wYmlkZWRpY2F0ZWQud2luZG93cy5uZXSHBH8AAAGHBH8AAAIwDQYJKoZI
hvcNAQELBQADggEBAEXF4WXBik4rb+xLj312GSu6oIgOPGLqOGnCseR6NU9DHaJo
MVG7Y4IEFwZI5VzPqS4sWoreNzhLwF2KbGXtZnWbs1LAAwLaOLQJx3uxRqFqH5BM
638GcXZ8Qc9Np82DQnw76lUah5BP/EkG6hgTcxeOF6m1yGaDJiwda43s+Y7CXmkD
XKSYxxqnvxGXlPnROyROnvIaRwd4l6UUYZmAEVaUjwuMdARJOhtn1vMLhNI0poS0
np39sqlWT/94vsdmWAF8/oPtyrocdKJha77vuLuRb1am1Wh6PwSp5I0HVmIsVeBk
Uah9Jj7LLdIySb8R00AIpdyp+7pj4Boz6VzctKs=
-----END CERTIFICATE-----
---
Server certificate
subject=C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = me@microsoft.com, CN = microsoft.com

issuer=C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = me@microsoft.com, CN = microsoft.com

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1881 bytes and written 407 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

It doesn't seem to be a Certificate Authority certificate. More like a self-signed certificate, so I don't know why the error is CaUsedAsEndEntity.

Interestingly, the same openssl operation used to give a self signed certificate error (see this deltalake issue for details), but it seems that something has changed in the openssl setup of the underlying Fabric VMs.

If anyone has any ideas for how to start solving this new issue (other than using the "allow_invalid_certificates"-hammer in perpetuity), I would be most thankful.

martroben commented 6 months ago

Upon further investigation, the correct command to check whether OneLake certificate is Certificate Authority or End Entity certificate is this:

openssl s_client -connect onelake.blob.fabric.microsoft.com:443 -showcerts | openssl x509 -text | grep "Basic Constraints" -A 1

This returns CA:TRUE, signifying that the certificate used is in fact a Certificate Authority cert, as the error suggests.

I'm not expecting Microsoft to alter their certificates to make it easier for people to use Polars in Fabric, so some workaround would still be appreciated.

Can anyone tell, what is the underlying module or crate that is giving the CaUsedAsEndEntity error? Maybe some setting can be passed to skip the CA vs EE check. (Somehow the Spark-based delta.tables module doesn't seem to be bothered by CA cert used as EE cert.)

ion-elgreco commented 6 months ago

@martroben all storage options are passed to the "object store" crate

martroben commented 6 months ago

Posted an issue/question to object_store repo: https://github.com/apache/arrow-rs/issues/5696

hnasrullakhan commented 6 months ago

Looks like object store bump up has caused this https://github.com/delta-io/delta-rs/pull/2311 . older versions of deltalake library

deltalake==0.16.2

this works fine

ion-elgreco commented 6 months ago

@hnasrullakhan please make an issue in arrow-rs repo then, there were zero code changes on our side.

hnasrullakhan commented 6 months ago

there was a bump up in object-store version to 0.9.1 https://github.com/delta-io/delta-rs/pulls?page=2&q=is%3Apr+label%3Abinding%2Fpython+is%3Aclosed

ion-elgreco commented 6 months ago

@hnasrullakhan that's correct, but what I am saying is. this didn't require any changes on our side outside of bumping it. So you should make an issue upstream

martroben commented 6 months ago

Upon further testing, deltalake==0.16.1 works fine, but starting from 0.16.2, I'm getting the error (also tested the latest: 0.17.3).

I think @hnasrullakhan also confirms that - their earlier claim about 0.16.2 working fine was a typo.

I don't see any changes in object_store version between 0.16.1 and 0.16.2 (granted, I don't speak fluent rust).

Does anyone have any ideas, what else could have introduced this error between these two versions?

ion-elgreco commented 5 months ago

@martroben can you check against v0.18?

martroben commented 5 months ago

@ion-elgreco, I sure can, but on Monday, when I'm back in office.

hnasrullakhan commented 5 months ago

what changed on v0.18 @ion-elgreco ?

hnasrullakhan commented 5 months ago

Could still repro with v0.18

martroben commented 5 months ago

@ion-elgreco, I confirm @hnasrullakhan's position: the same issue still occurs, even with deltalake==0.18.0: error trying to connect: invalid peer certificate: Other(CaUsedAsEndEntity)


As an aside - I'm trying to push a case with MS support in parallel. Their initial position was that since there is no problem with the older versions of deltalake, it's a 3rd party problem.

I suggested that the root cause is still their improper use of certificates - the 3rd parties might have just tightened the rules about what they find acceptable to work with. Not sure if I'll win this argument though, being a mere mortal.

What can men do against such reckless hate? - Théoden, son of Thengel

martroben commented 5 months ago

Apparently Polars is not the only downstream library where delta lake interactions broke around deltalake v0.17. The linked issue does not seem to be related to Fabric certificates however.

Nevertheless, for anyone looking, Daft might be a viable alternative to Polars soon - especially if they implement deletes and merges for delta lake.

ion-elgreco commented 5 months ago

Apparently Polars is not the only downstream library where delta lake interactions broke around deltalake v0.17. The linked issue does not seem to be related to Fabric certificates however.

Nevertheless, for anyone looking, Daft might be a viable alternative to Polars soon - especially if they implement deletes and merges for delta lake.

That issue is not really related, Daft is using our internal methods for their writer. When we make changes in our internal methods this is not marked as a breaking change :)

martroben commented 5 months ago

Thank you for the context @ion-elgreco. In that case it is indeed somewhat unfair for them to cite breaking changes in deltalake, when the issue is at least partly caused by their own misjudgment of what is exposed and what is not.

I'm still trying to understand though, what was the exact change between v0.16.1 and v.0.16.2 that changed the behaviour of SSL connections.

martroben commented 4 months ago

Apparently the problem is no longer present in v0.18.1.

Not sure what caused the fix between 0.18.0 and 0.18.1. If I had to guess, it might be bumping object store from 0.9 to 0.10 where object store updated their reqwest dependency. I guess we'll never know, but I'm nonetheless happy.

Microsoft is still using a self-signed CA certificate as EE certificate in OneLake connections from Fabric. However, I had a call with their support and the product team has apparently promise to do something with the certificate. Not sure, what though. Hopefully it will not break whatever caused the fix.

Josh-Hiz commented 4 months ago

Additionally, I get the similar following error on 0.18.1: OSError: Generic MicrosoftAzure error: Error after 1 retries in 7824.991305s, max_retries:10, retry_timeout:180s, source:error sending request for url ...

When performing a write on a very large data table, should I make a new issue for this? @ion-elgreco

djouallah commented 3 months ago

Microsoft deployed a new update to the notebook environment which should fixed this issue, could please give it another try. ( it may take some times to reach your particular region etc)