exasol / cloud-storage-extension

Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems
MIT License
7 stars 11 forks source link

Compatibility with Azure OneLake (Fabric) #309

Open ThomasBestfleisch opened 2 months ago

ThomasBestfleisch commented 2 months ago

Are the cloud storage extensions compatible with Azure OneLake? According to https://learn.microsoft.com/en-us/fabric/onelake/onelake-api-parity the API is compatible to ADLS Gen2. It would be good if we could verify that and also include it in the documentation.

pj-spoelders commented 1 month ago

The original estimate was around 1 day if it were just changing/relaxing the regex. 1 Day was spent getting the project to build (DateTimeConverterTest is brittle and the output of the tests is ginormous ~2000000 lines when packaging and running tests), adding a test and testing out the regex theory.. Unfortunately it seems the change will be a bit more convoluted than this and I will need to debug to see how it works. ML has offered to help out if needed.

pj-spoelders commented 1 month ago

I've added the regex and built and uploaded/hosted a .jar for testing for the client

pj-spoelders commented 1 month ago

URI format IS different from the one in the documentation it seems. Need to add this additional format to the experimental version. EDIT: it wasn't. I tried with the exact URI in local unit tests and didn't trigger the same exception.

pj-spoelders commented 1 month ago

Will need to discuss with project maintainers if this is possible with current implementation.

pj-spoelders commented 2 weeks ago

Updated epic in Jira, added time spent.

pj-spoelders commented 2 weeks ago

Current state of affairs:

We currently only support shared key authentication for ADLS Gen2 in storage-cloud-extension. It's possible that shared key authentication is possible for OneLake (but it's undocumented) but from what I gather this key is not available in the UI and there's no mention of it in the documentation.

The next steps if we take this route would be to add and test authentication using Entra ID (Azure AD) since this seems what's supported for OneLake: Even then this is still a what if, since none of this is documented for OneLake, only for ADLSGen2 (in the hadoop docs). So I can not guarantee this will work. There's no mention of it anywhere on the web and it's not officially supported. There are some other things we should keep in mind here, for example, how long those OAuth Tokens are valid since they would need to be updated in the connection object, which is impractical.

What can I offer to the customer right now: 1 Possible workaround is to export to S3, GCP, ABS or ADLSGen2 since this is actually supported by cloud-storage-extension right now. They can then add a 'shortcut' from OneLake to these storage containers (also officially supported).

pj-spoelders commented 1 week ago

Meeting scheduled to talk with customer/

ckunki commented 1 week ago

Progress on hold until July 16th