Closed jairamjidgekar closed 1 year ago
Thanks for the feedback, we’ll investigate asap.
Hi @jairamjidgekar Jairam, thanks for reaching out but generally we use GitHub issues to report/discuss issues or specific questions about the SDK so I'm not sure how much I will be able to help with a high-level design question such as this.
In general, the approach sounds fine, but I am not at all familiar with Apache NiFi so I can't really provide any information about that. This SDK provides a few different APIs that allow you to upload data either all at once or in pieces and Azure Storage should support your larger files without issue. All data uploaded is just binary so Azure Storage will handle your data zipped or unzipped, however you choose to upload it.
Here is a sample on how to use the SDK to upload a file to a block blob. The upload_blob
method will automatically split up a large upload into smaller pieces (4 MiB by default) to optimize performance and make sure the network can handle the upload.
Another option that may or may not be useful in your case is copying a file from another web location directly into Azure Storage. Here is a sample of that. There are some nuances around authentication here if your file is not public, but it may still be possible if the file is accessible via OAuth.
Hopefully that helps some. Thanks.
Thank you Jacob @jalauzon-msft .
I tried loading the files using python from local machine to upload to Azure. It was able to upload huge files >2GB without any issues.
upload_blob
did the trick. I will work on it further and let you know if there are any roadblocks.
Thanks again, Jairam P.
Hi,
I have an use case where I would like to upload a file from web (web scraping the .zip files from website). These zip files are huge (>2GB) and after unzipping, the file size increases drastically (>40GB in some cases).
I would like to leverage the Azure blob storage for this using Python and azure connectivity.
My thought process is to scrape the content from the python and process the files using the Apache NiFi and load it to the azure.
Please do let me know if this approach is feasible and can be accomplished.
Thanks