Open davidmatson opened 6 years ago
It's also worth considering the TryAddTransfer mechanism overall - what is it intended to achieve? If it's trying to avoid handling cases where two transfers are writing to the same file at the same time, it likely doesn't accomplish that purpose - if one uses a file path and the other uses FileStream, I believe the StreamLocation ToString will currently treat them as different.
Would it be better to remove TryAddTransfer & this exception entirely?
@davidmatson
The code change you mentioned does be able to do this. I've alread put this in our backlog. We'll discuss about this to these day to decide whether we'd remove this limiation.
@EmmaZhu any update when it will be fixed or modified?
Any update on this bug? We're having to write an expensive workaround decorating the entire stream with an override of ToString just to avoid hitting this error condition in production.
@davidmatson , sorry for the late reply. We are discussing about this, only worried about breaking change and other potential issues. Does you scenario need to download to multiple streams at the same time, or you'd want to download to a stream when there's another job started for a while downloading the same blob to another stream?
If your scenario is like the former one, you can implement the destination stream to write to multiple inner stream at the same. If it's the latter one, it'd really need code change in DMLib, we'll evaluate the potential risk and will let you know our decision ASAP
Thanks Emma
We need to be able to handle both scenarios - we have a process that's downloading multiple blobs concurrently, and in some cases the source blob might be the same. Our overall expectation is that TransferManager will have the same behavior regardless of what other concurrent operations are occurring in the same process. Thanks for taking a look.
@EmmaZhu any updates regarding the issue?
@evgenyvinnik @davidmatson,
We just released a new version (0.9.1) which allows adding transferring from the same source to multiple Stream instances. Please be noted, to download one Blob/Azure File to multiple Stream instances, DMLib will try to download content of Blob/Azure File multiple times, instead of download once and write the same content to multiple Stream instances.
Which service(blob, file) does this issue concern?
Blob
Which version of the SDK was used?
Microsoft.Azure.Storage.DataMovement version 0.8.0
On which platform were you using? (.Net Framework version or .Net Core version, and OS version)
.NET Framework v4.7.1
How can the problem be reproduced? It'd be better if the code caused the problem can be shared.
It's a concurrency problem; it can repro if the concurrency case is hit.
What problem was encountered?
Microsoft.WindowsAzure.Storage.DataMovement.TransferException: A transfer operation with the same source and destination already exists.
Have you found a mitigation/solution?
No
The problem appears to be here: https://github.com/Azure/azure-storage-net-data-movement/blob/7c8258a8ffbef96b10ebdabd14f5d129048ce1de/lib/TransferManager.cs#L1277 Maintains state across concurrent transfers, using some kind of key to distinguish them. https://github.com/Azure/azure-storage-net-data-movement/blob/7c8258a8ffbef96b10ebdabd14f5d129048ce1de/lib/TransferJobs/TransferLocation.cs#L62 Uses the TransferLocation ToString to tell whether two sources or destinations are the same. https://github.com/Azure/azure-storage-net-data-movement/blob/7c8258a8ffbef96b10ebdabd14f5d129048ce1de/lib/TransferJobs/TransferLocation.cs#L62 For streams uses Stream.ToString (!) to determine if two streams are the same. https://github.com/Azure/azure-storage-net-data-movement/blob/c133c98dc7cb211abb171f99d153d93141f907d0/lib/TransferJobs/StreamLocation.cs#L108 This just calls object.ToString, which returns the type name ("System.IO.MemoryStream"), and is insufficient to determine instance identity. One option worth considering instead is to use object.ReferenceEquals rather than .ToString to tell if two stream instances are the same.