Azure / azure-storage-java

Microsoft Azure Storage Library for Java
https://docs.microsoft.com/en-us/java/api/overview/azure/storage
MIT License
189 stars 163 forks source link

Remove String duplicates generated by container.listBlobs() by internalizing them #518

Open andreaturli opened 4 years ago

andreaturli commented 4 years ago

Which service(blob, file, queue, table) does this issue concern?

Blob service

Which version of the SDK was used?

v8.4.0

What problem was encountered?

Using container.listBlobs() creates a proportional number of identical String, "account.blob.core.windows.net" to build the URI in the CloudBlockBlob, to the number of ListBlobItem in the iterable.

Please find below some relevant screenshot captured with a java profiling tool:

objects paths

Have you found a mitigation/solution?

I believe String interning on that collection of identical strings will ensure that all strings having same contents share same memory and will optimise the memory required to run listBlobs

rickle-msft commented 4 years ago

Hi, @andreaturli. Thank you for making this suggestion. This seems like it would be a good performance improvement in some scenarios. We will investigate how best to offer this option and work to support it in a future release

jhalterman commented 4 years ago

FWIW, it's not just Strings we've seen large numbers of duplicates for, as the screenshot above shows there are other types of duplicate allocations.

rickle-msft commented 4 years ago

Thank you for highlighting that. Reducnig the allocations of CloudBlockBlob and StorageUri I'm not really sure is possible. When listing, a CloudBlockBlob is created for each list entry, and because each entry points to a distinct blob, it requires a distinct object. And each CloudBlockBlob requires a different instance of StorageUri. The way the SDK is build, it's impossible to share instances of these types when pointing to distinct blobs on the service. If you have thoughts on ways we might be able to be more memory efficient, we are more than happy to discuss them.

jhalterman commented 4 years ago

I think intern'ing the Strings and not duplicating empty HashMaps would be a nice improvement.