Closed djouallah closed 8 months ago
I'm able to replicate this under both WSL and Ubuntu, with the following:
INSTALL azure;
LOAD azure;
SET azure_storage_connection_string='DefaultEndpointsProtocol=https;AccountName=azuresdkdocs;AccountKey=redacted;EndpointSuffix=core.windows.net';
SELECT count(*) FROM 'azure://development/testing_of_duckdb/file.snappy.parquet';
Partial workaround for users on linux running into this: installing curl seems to fix the issue at least on ubuntu for me. The problem here is that libcurl is statically linked into the extension and there are certificates missing or in the wrong path. Installing curl may resolve this issue for some environments but a more thorough solution is required.
People at ArcticDB seem to be running into the same issue here: https://github.com/man-group/ArcticDB/issues/514. There's a PR up as we speak. They have actually already gone through the work of getting a PR in at the azure sdk for setting the path, it will be available at the 10th of november through vcpkg. We should be able to make use of their hard work by updating the azure sdk by then and exposing the path through duckdb.
This one seems to be affecting my workflows as well, see discussion at https://github.com/microsoft/PlanetaryComputer/discussions/278#discussioncomment-7229515
Would love to be able to use this azure extension, it'll make working with a few GeoParquet datasets a lot easier. Thanks for all your great work on this so far!
I'm using the python:3.10.13-bullseye image and I got the same issue. I already had the latest version of libcurl4-openssl-dev but I tried to also install libcurl4-gnutls-dev but still got
Error: Invalid Error: Fail to get a new connection for: https://stsynussp.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)
thanks for reporting @cholmes and @deanm0000. This is definitely something that will need fixing
Same also happens to me using Ubuntu, there were no problem using Windows instead
Are the binaries are built on an RHEL distribution ?: https://github.com/man-group/ArcticDB/issues/514 (https://github.com/Azure/azure-sdk-for-cpp/issues/4738)
The following "fixes" the error for me on Ubuntu 22.04, but I don't know if there are security implications:
mkdir -p /etc/pki/tls/certs
ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
Are the binaries are built on an RHEL distribution ?: man-group/ArcticDB#514 (Azure/azure-sdk-for-cpp#4738)
The following "fixes" the error for me on Ubuntu 22.04, but I don't know if there are security implications:
mkdir -p /etc/pki/tls/certs ln -s /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
This worked for me :) Thanks @kmatt
People at ArcticDB seem to be running into the same issue here: man-group/ArcticDB#514. There's a PR up as we speak. They have actually already gone through the work of getting a PR in at the azure sdk for setting the path, it will be available at the 10th of november through vcpkg. We should be able to make use of their hard work by updating the azure sdk by then and exposing the path through duckdb.
The PR mentioned above appears to be released now: https://github.com/Azure/azure-sdk-for-cpp/pull/4982
@daviewales Thanks for the ping, latest vcpkg release now has this version of azure sdk as well, I will try to find some time to look into this issue in the near future!
Hello, just for info I also got this annoying issue so I have make a PR#35 to make possible:
hope it will help :)
Is this fix now released in latest Azure extension for DuckDB v0.10.0
?
@brianwyka not yet!
however, once this job has succeeded, you can use the nightly build of azure, which does contain these fixes with: force install azure from 'http://nightly-extensions.duckdb.org';
The azure website page of the doc is not up to date so here you go:
Name | Description | Type | Default |
---|---|---|---|
azure_transport_option_type |
Underlying adapter to use in the Azure SDK. Valid values are: default or curl . |
VARCHAR |
default |
Setting
azure_transport_option_type
explicitly tocurl
with have the following effect:
- On Linux, this may solve certificates issue (
Error: Invalid Error: Fail to get a new connection for: https://<storage account name>.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)
) because when specifying the extension will try to find the bundle certificate in various paths (that is not done by curl by default and might be wrong due to static linking see issue).- On Windows, this replaces the default adapter (WinHTTP) allowing you to use all curl capabilities (for example using a socks proxies).
- On all operating systems, it will honor the following environment variables:
CURL_CA_INFO
: Path to a PEM encoded file containing the certificate authorities sent to libcurl. Note that this option is known to only work on Linux and might throw if set on other platforms.CURL_CA_PATH
: Path to a directory which holds PEM encoded file, containing the certificate authorities sent to libcurl.
@brianwyka not yet!
however, once this job has succeeded, you can use the nightly build of azure, which does contain these fixes with:
force install azure from 'http://nightly-extensions.duckdb.org';
Thanks for the great work!
I am not sure this fixed it for me. I performed duckdb.sql("force install azure from 'http://nightly-extensions.duckdb.org';")
. I still had to do a cert simlink for it to work on WSL ubuntu 22.04.
Hi @luuk-codebeez
Just to be sure, did you set the the variable ?
SET azure_transport_option_type = 'curl';
Hi @luuk-codebeez
Just to be sure, did you set the the variable ?
SET azure_transport_option_type = 'curl';
Woops that worked with the nightly build
@luuk-codebeez Great to hear!
I just deployed the azure nightly binaries, so from now on with force install azure
you will get the updated extension with this feature in it.
Also, this has now been added to the docs https://duckdb.org/docs/extensions/azure.html.
Thanks a lot for the effort here @quentingodeau!
Would it make sense to make SET azure_transport_option_type = 'curl';
the default when running on Linux?
I thought about that but didn't do it, here my opinion/experience on this, when you change the default behavior of something it's really complex to rollbacks this changes.
When you change the default you have to keep the change so if tomorrow the azure SDK evolve to handle our use case by for example adding an AZURE_SSL_BUNDLE_PATH
environment variable, then all user that are used to the SDK will expect that the env variable is handle exactly as it is describe in the azure doc. So in this case I will have to then handle this case with keeping the other parameters that I have allowed. I agree that it may not look like a great deal like this but then you have more and more small things like this pills up and making the code more and more complex for newcomers that do not have the full history.
But once again it's only an opinion ^^
it works fine in windows, but when running from a notebook using linux, I get this erros