[FEATURE] Support for Azure data lake storage

cece95 commented 7 months ago

Good morning,

You mentioned in the readme that most of the code was developed with AWS in mind, how complex would it be to add support for reading/writing to Azure Data Lake instead of S3?

jmcorreia commented 7 months ago

Hi @cece95, thank you for the interesting question! The answer to the question is: it depends :) . The framework should support anything that spark supports. Spark supports reading/writing from/to Azure, so if you provide the a path pointing to Azure, our Lakehouse Engine framework should support reading/writing using it. The framework would only require extra developments if you would require using any feature which depends on:

The File Manager (lakehouse_engine/core/file_manager.py), which currently supports S3;
The utils classes in lakehouse_engine/utils/storage/*, which currently support S3 and local file system. Usually these utils are used in case you want to provide a schema to the acon, which is sitting on S3, instead of providing the schema directly, for example

Anyway, please feel free to try it out and reach out with more details on your use case, in case you find any block :).

cece95 commented 7 months ago

Thank you for your fast reply @jmcorreia I'll have a play around and let you know!

jmcorreia commented 7 months ago

Nice :) Feel free to re-open the issue if you find any blocker.

adidas / lakehouse-engine

[FEATURE] Support for Azure data lake storage #3