Open catalin-micu opened 6 months ago
Hi @catalin-micu, filtering blobs using tags or access tier is currently not supported. You can use one of the below ways for filtering blobs during copy.
include-after
or include-before
flags.include-path
, exclude-path
and exclude-blob-type
.All my data is Block blob
at the moment. Is there a way to change that?
Do you mean changing the blob type from Block Blob to Append Blob or Page Blob? If yes, then there is no direct way to do that. Can you use last modified time for filtering the blobs during copy?
Yes, I meant changing the blob type, I understand it's not possible. I can't use the last modified timestamp either, because there is no pattern for uploading this data that I'm trying to filter. Situation is like this: over the course of years, from time to time, wrong data was uploaded. Now I'm need to move the whole content of the storage account, preferably filtering this wrong data. About the wrong data the only thing I can find out is the directory name. All directory names (for both good data and bad data) are UUIDs, so can't use any pattern filtering there. We are talking hundreds to thousands of directories, so adding each name I'm trying to filter in the AzCopy command is also not an option.
Is there anything else worth trying? I was leaning towards filtering based on blob tags or blob attributes, but it does not seem possible
What blob attribute do you want to use for filtering the wrong data (other than tags or access tier)?
I don't have any in mind, basically anything that I can set to a specific value for all the wrong data, then pass said value to azcopy to filter be it blob property, directory property, anything
Alright, I see a feature-request
label was added; to summarize, I would best like to filter by access tier
Blob Inventory (https://learn.microsoft.com/azure/storage/blobs/blob-inventory) captures metadata/attributes on objects like Access Tier. You could use a Blob Inventory report as a input to AzCopy with the --list-of-files
param (https://github.com/Azure/azure-storage-azcopy/wiki/Listing-specific-files-to-transfer).
Interesting solution, but sadly, it won't work because of the performance issues. The resulting list of files would have millions of entries, every time, for multiple transer jobs I will do (200+)
AzCopy 10.23
Linux OS
azcopy copy "source_storage_account_container" "destination_storage_account_container" --recursive
Problem: Copying entire storage containers and using azcopy to filter some blobs
There is an unpredictible amount of data, scattered throughout the container, that we want to filter out. We are talking about petabytes worth of data in total. We can identifiy all the data that needs to be filtered. Due to internal policies, we
cannot
alter the data (cannot rename/add prefix or anything of the sort , thereforecannot
use--exclude-pattern
or--exclude-regex
), nor can we archive it. These two options are out of the question.What I want to do is filter data in a
storage account to storage account transfer
,through azcopy copy
, based on either a tag, or access tier (everything is currently hot tier, but unwanted data can be moved to cool or cold) or any other blob attribute that can be assigned to the data, without changing names, directory structure or archiving.Can this be done?