adidas / lakehouse-engine

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
https://adidas.github.io/lakehouse-engine-docs/
Apache License 2.0
198 stars 36 forks source link

[FEATURE] Support for Azure data lake storage #3

Closed cece95 closed 7 months ago

cece95 commented 7 months ago

Good morning,

You mentioned in the readme that most of the code was developed with AWS in mind, how complex would it be to add support for reading/writing to Azure Data Lake instead of S3?

jmcorreia commented 7 months ago

Hi @cece95, thank you for the interesting question! The answer to the question is: it depends :) . The framework should support anything that spark supports. Spark supports reading/writing from/to Azure, so if you provide the a path pointing to Azure, our Lakehouse Engine framework should support reading/writing using it. The framework would only require extra developments if you would require using any feature which depends on:

Anyway, please feel free to try it out and reach out with more details on your use case, in case you find any block :).

cece95 commented 7 months ago

Thank you for your fast reply @jmcorreia I'll have a play around and let you know!

jmcorreia commented 7 months ago

Nice :) Feel free to re-open the issue if you find any blocker.