Azure / azure-kusto-python

Kusto client libraries for Python
MIT License
183 stars 109 forks source link

Ingestion at a folder level #498

Closed sa1sen closed 12 months ago

sa1sen commented 1 year ago

I have been going through the Python SDK and Ingest command. It looks like you can only ingest at a file level.

I have a problem where I have a parent folder in ADLS Gen 2 with n number of files (no fixed number) which follow the same schema. E.g.

ADLS Gen2 ---container-curated -------parent1 ------------file1.parquet ------------file2.parquet ------------file3.parquet

-------parent2 ------------file4.parquet ------------file5.parquet ------------file6.parquet

At the moment I need to use Azure Storage Account SDK to iterate through the folder and trigger ingest commands. For me the most important thing is to be able to monitor success and failure at parent Folder level. I do not have any mappings stored between parent folder and the individual files. Is there any other way to monitor progress at parent folder level?

I have also tried replicating the above problem in ADX itself using the .ingest into command (Ingest from storage)

Again, this is documentation is more at file level, however, for 'SourceDataLocator' as part of this command I found the documentation says: you can specify the Storage Connection string to something below (Storage Connection):

https://StorageAccountName.dfs.core.windows.net/Filesystem[/PathToDirectoryOrFile]

But I am unable to execute this .ingest into command pointing to a Directory/Folder level. File level it works...

Any suggestions/help?

AsafMah commented 12 months ago

We don't have support for this in the python SDK, however we have different tools that might satisfy your case.

https://dataexplorer.azure.com/oneclick/ingest?sourceType=adls+gen2+container&ingestionType=historical

Has a way to mass import files from storage (up to 10,000 files) with a friendly interface.

We also have the LightIngest tool - https://github.com/Azure/Kusto-Lightingest

Which can be used to ingest from folders and storages, the dataexplorer web app can also generate a command line for Lightingest (pick historical data).

For now I will close the issue as it doesn't relate to the python SDK specifically.