Azure / azure-storage-java

Microsoft Azure Storage Library for Java
https://docs.microsoft.com/en-us/java/api/overview/azure/storage
MIT License
189 stars 164 forks source link

This API is useless (v10) #432

Open bechhansen opened 5 years ago

bechhansen commented 5 years ago

This week I had to move an Azure Cloud storage Java application from using file storage to blob storage. I need to do this only because I need to stream video from the storage.

The Azure features I use are pretty simple. Basically I generate shared access signature for existing files in the storage based on a URL, and I also need to list all files in the parent folder of a given file.

I decided to use the new Blob Storage SDK v10 for Java as version 8 is marked as Legacy.

I used two days trying to figure out the new API. I never succeeded in either generating access signature nor listing blobs of a folder/container. The API is an unintuitive mess. Documentation is rubbish and out of sync with code, and the example is useless.

I ended giving up, and went back to use the old version 8 API. After an hour my change from files to blobs was done.

I really urge you to stop developing on the version 10 API and Kill it. KILL IT WITH FIRE! I've been doing Java professional, every day, for 18 years, but never seen such a poor API.

The old version 8 API was pretty good, intuitive and easy to work with. I see absolutely no reason for not evolving this API in the direction you need.

rickle-msft commented 5 years ago

Hi, @bechhansen. Thank you for your feedback. I am sorry that you have had such a bad experience with the new version of the sdk. Can you please help us improve by offering some details about what specifically were the pain points? Where exactly did the API become hard to intuit? Which documentation is out of date and in what way are the samples useless or what would be more helpful to see? Was any of your difficulty with Rx or was it all specifically with the new API surface?

Did you see this short explanation of the new design, which has some information on SAS?

We are again sorry that you had this discouraging experience and hope to improve our story.

bechhansen commented 5 years ago

Hi @rickle-msft

Thank you for your answer.

When using the v8 API all I need to do is pretty much:

    URI azureBlob = new URI("https://storagesample.blob.core.windows.net");
    CloudBlockBlob blob = new CloudBlockBlob(azureBlob);

    if (blob.exists()) { 
        String signature = blob.generateSharedAccessSignature(policy, groupPolicyIdentifier);
        System.out.print("Signature for " + blob.getUri().toASCIIString() + " is: " + signature);

        for (ListBlobItem item : blob.getParent().listBlobs()) {
        System.out.print("Blob in container: " + item.getUri().toASCIIString());
    }
    }

Very intuitive and easy to figure out, just from instantiating a CloudBlockBlob. I do not need to scrutinize any documentation or examine any examples to do this.

For the v10 API however I have no idea about how to get started without looking at the documentation. It is not at all intuitive.

I can start by doing something like:

    URI azureBlob = new URI("https://storagesample.blob.core.windows.net/somefile.txt");
    BlockBlobURL blockBlobURL = new BlockBlobURL(azureBlob, pipeline);

But then what? blockBlobURL don't help me to either get the parent container for listing siblings nor generating the signature for the URL.

So you give me a link to the documentation on the Wiki. The problem is that the page you are linking to is not part of the bulleted list of the Wiki. It's not part of the main pages of the Wiki TOC. I can now see a small link to it on the right of the page, but I never found that page until you showed it to me. If the API was intuitive I wouldn't need it at all.

So after scrutinizing the newly discovered documentation I now find out that I need to use the class BlobSASSignatureValues to generate the SAS. How would I know that without scrutinizing the documentation? Why is this simply not a method on e.g. BlockBlobURL like for the v8? The naming of BlobSASSignatureValues is really bad. It tells me absolutely nothing about what its purpose is. I would think it is simply a POJO class for holding some values. The name indicate it is a class of values, not a class of functionality.

So now I'm ready to start using the 'secret' BlobSASSignatureValues class in my code - guess what - the class does not even exist in the API !!! Where is it?

Let's have a look at the examples. I have found a sample at https://github.com/Azure/azure-storage-java. It don't help me at all. It don't help me creating a blob object from an URL. It don't help me generating a SAS and it don't help me listing the blobs of a container.

I also found this one (only one?) example: https://github.com/Azure-Samples/storage-blobs-java-v10-quickstart/blob/master/src/main/java/quickstart/Quickstart.java So this example actually do something about listing blobs of a container. I still have no idea about getting the parent container of a blob though. The example is immensely complicated. The example has a static listAllBlobs method as the API will return up to max 10 blobs. Really, only 10? So everyone who need to list content of a container will need to implement this iterative step themselves? Why does the API not hide this limit for the user? I don't care how you will find all blobs of a container for me, but I will NEVER be happy by getting only the first 10. What is a BlobFlatSegment? I have absolutely no clue and the Javadoc is no help.

So still not happy at all with this API. I wasted valuable time trying to use it. Peer

rickle-msft commented 5 years ago

Thank you, Peer, for all that information. We have already scheduled a meeting for this week with the rest of our team to address feedback like this, so we are doing our best to take this input seriously, and it would be very helpful if you are willing to continue discussing with us, though I understand if you are frustrated enough with your experience that you do not wish to engage with it any more.

I think adding a method to get the parent container and adding sas generation methods to the URL types could be reasonable additions. It is possible to do both of those things now, but they do require a couple steps, and I think it would be best to leave this as a request for convenience methods to provide that functionality. If there are other gaps in functionality like this that you noticed, I'd be happy to take a longer list of feature requests.

The wiki has been updated to reflect that BlobSASSignatureValues was renamed to ServiceSASSignatureValues so that it works with containers as well. I apologize again for your frustration in finding that type.

For samples, the first link you shared is just the link to the main page of the repo, so I'm not exactly sure what you are referencing, but perhaps these samples would help for constructing a blob sas and listing (the helper method used is defined here).

We also do have a sample on how to abstract away the limit for max results and get an apparently seamless list of blobs that I hope will be helpful. The reason it is not offered through the listing api by default is that we guarantee a contract that one api call on a url object will make exactly one rest call. This is to offer transparency to the user about exactly what is happening, when they are using network resources, incurring io latency, and how much they can expect to be charged for a certain operation based on transaction costs. Please let me know if you disagree with that philosophy at all.

The quickstart you mention is more a template than anything. It is possible to get a listing of up to 5000 results in one call if no max results parameter is set, so I hope that addresses your concern about limited listing. A BlobFlatSegment is distinct from a BlobHierarchicalSegment in that the first is the result of a flat listing and the second the result of a hierarchical listing. More specifically, listing with a delimiter returns results in a way that mimics a directory structure, so it is "hierarchical" where as listing with no delimiter returns simply a flat list of blobs. Please let me know if that distinction is still unclear.

Again, I apologize for your frustrating experience with the v10 SDK and really do appreciate you taking the time to engage and share some feedback. If there is anything you need in the way of support for v8 in the mean time, please let us know.

teeohhem commented 5 years ago

We've also been using this (a team of advanced java developers) and have been struggling with getting it to work correctly. Even when it does work, it doesn't do so consistently. We moved to V10 due to concerns that V8 would be EOL and no longer supported in the near future. That being said, this API is unusable and the work to translate our existing code to this has been frustrating and is much more pain than it's worth.

rickle-msft commented 5 years ago

Hi, @teeohhem. Thank you for chiming in. The more we can hear about issues like this, the better we can respond and improve, so we appreciate you taking the time to voice your difficulties.

Can you please elaborate on where you have had difficulty getting it to work? And what are the inconsistencies in correctness you mention? Any specifics you can provide or your pain or frustration would be very helpful.

v8 is not EOL at the moment and has not even been technically deprecated; we are currently supporting both libraries in parallel and have not announced a change in that plan. You are correct that v10 is our long term Java focus, but when we do announce a departure from v8, we will give 6-12 months notice before we stop support for it. Therefore, if you are more comfortable with v8, I do not see any immediate risk in using that one.

bechhansen commented 5 years ago

When using a SDK like this there are a few requirements that need to be meet.

In my opinion the V10 SDK fails on all requirements.

I really understand you concern about transparency to the user about exactly what is happening, resource usage and latence. I don’t think the SDK should ensure that one api call on a url object will make exactly one rest call. Providing this information to the user of the SDK can very easily be achieved by exposing methods for metrics, statistics and resource usage. It could also easily be implemented as an observer pattern to provide real time information to the user. The existing V8 SDK could be evolved to expose this type of information.

rickle-msft commented 5 years ago

To ensure I understand your difficulties correctly, would it be accurate to say (responding to each bullet respectively):

We definitely appreciate your comments on the value of a 1:1 relationship between api calls and rest calls. That is something we considered to be very valuable, but if you as a highly experienced java developer think that this contract is not the right way to offer that insight/control, then we will strongly take that into consideration.

Having noted your recommendations on methods for getting information on resource consupmtion, etc. or using an observer pattern, do you think the problem could be mitigated in the v10 sdk by offering a more comprehensive object model or more compound operations?

Currently the TransferManager is where a few basic compound operations live, mostly for large data transfers. It is an admittedly restricted set of operations, but if we expanded the number of these kinds of methods to include, for instance, the full container enumeration which hides the iterative calls as you mentioned earlier, would that ameliorate the difficulties in v10 at all? (These wouldn't all live in the same place necessarily, I'm just saying generally if we found a good place to expose them and did indeed provide them.)

Another of the key ideas of the v10 design is that the URL objects are immutable and stateless. This was intended to offer a lot of security to the customer, as we found in v8 that customers were often bitten by trying to share Cloud objects (i.e. CloudBlob) between threads even though they are not thread safe. It was also difficult to make them thread safe as accessing them atomically would then in many cases require holding a lock, which is not safe for an object making network calls. There was further confusion (as alluded to earlier) about what information was cached locally and what required a network call to retrieve/send. Hence, we stripped them down to essentially all be little more than an set of methods to be crystal clear that the data exists and persists in the cloud entirely. It sounds again like this is not a desirable tradeoff in your mind and that you prefer the stateful object model of v8. Is that a fair read of your comments? Do you see any value in keeping these URL objects available to customers who maybe prefer the transparency and simplicity but also offering within the same artifiact types more akin to CloudBlockBlob that behave similarly to the v8 models?

Thank you again for your time and feedback.

mattvryan commented 5 years ago

In reading @bechhansen's post it sounds very similar to my experience, not only in the problems experienced but the frustration.. I am working on updating the Azure Blob Storage plugin for Apache Jackrabbit Oak, which is an Apache open-source content repository which implements the JCR specification. This plugin adds the ability for Oak to store binary data in Azure. I'm evaluating whether I need to move away from current use of the v8 SDK and use V10 instead, or if I should just use the REST endpoints. I probably have to move to the new v10 SDK because I'm hitting a bug in the v8 SDK that doesn't appear to be a bug in v10. But moving to v10 is looking like it will be a significant effort, and I am giving serious consideration to just calling the REST endpoints directly instead of using the SDK because that looks like it will be easier and will take less time. Because what I am working on is open source (link), maybe it will be helpful to use as an example of what such a migration (from a v8-based implementation to a v10-based implementation) is like with the current API?

choronz commented 5 years ago

v10 SDK is just a wrapper around the new azure rest v2 client-runtime library. You can write your own lite-weight wrapper around it using the HttpPipeline builder pattern, as I personally find it too verbose and non-intuitive too.

rickle-msft commented 5 years ago

@choronz Thank you for your feedback. Is there anything you would like to add to this thread about what in particular you found non-intuitive and needlessly verbose?

vamsikt commented 5 years ago

@rickle-msft migrating from azure v8 to V10 is very painful. For example, Alpakka S3 extension library is very straight forward and intuitive. They have both Java/Scala API's.

rickle-msft commented 5 years ago

@vamsikt Thank you for chiming in. Could I please trouble you for some details on what is painful? Some difficulties I could imagine: Is it the lack of guidance on migration that could be mitigated by some side by side comparisons of common scenarios? Or are there key features missing from v10 that prove cumbersome to upgrade from v8 that we could add support for? Is the change in object model simply too high a barrier to entry for upgrading? Is the object model too confusing to be used effectively? Or any other points of pain you can think of.

It is not an exaggeration to say that we are meeting every day to come up with the best possible story to address this feedback. We are hearing loud and clear that the upgrade we have proposed in v10/11 has been problematic for quite a few customers, and we are investing a significant amount of time and resources to address those concerns. Any and all feedback you guys have now, no matter how harsh, will help ensure that the solution we come up with in the near future actually targets the difficulties faced by you, our customers.

mattvryan commented 5 years ago

For one example of what it takes to move from V8 to V10, you can see this pull request. This represents a work-in-progress trying to migrate the Apache Jackrabbit Oak AzureDataStore component from V8 to V10, and as you can see it is a substantial effort. And more changes will be required as it is still not done and not passing all the previous unit tests.

You don't have to look at the whole pull request. You can see most of the changes I have to make in the "AzureBlobStorageBackend.java" class. Github currently reports over 900 changes in that file alone, all based on just trying to migrate to the new SDK. However, don't forget the brand-new "AzureBlobURLInputStream" class I had to create just for this, in order to bridge async downloads of the SDK to they sync download model of the JCR specification without reading huge binaries into RAM or onto disk first.

In addition there are some things from the V8 SDK that I still haven't figured out how to do with the new SDK. Some that come immediately to mind are:

rickle-msft commented 5 years ago

Hi, @mattvryan. Thank you for following up again. This is super helpful. Being completely honest, I'm not sure we can do much about the magnitude of the break. They are fundamentally different object models and paradigms that are designed to be used differently. We aren't going to be able to turn the upgrade into a find/replace.

That said, we can definitely add some more meat to the SDK to make sure it is fully featured and doesn't feel quite so anemic for convenience and safety. Responding in order.

mattvryan commented 5 years ago

Hi @rickle-msft,

I don't see why you couldn't copy the source itself in v10. This should be fully supported. If it's not clear that this is supported, then that indicates discoverability/intelligibility issues that we need to address. What leads you to believe that you cannot do this any more?

When I tried to do this, the SDK threw an exception with a 404 HTTP status code. In my code I had just barely verified that in fact the blob did exist.

rickle-msft commented 5 years ago

Odds are the blob did exist, but you didn't have access to it (there are security reasons about why this might be and why it comes back as a 404, but I won't get into that here). If you add a SAS token or give public access to the container, you should be able to copy a blob to itself. I just verified it myself.

mattvryan commented 4 years ago

Hi @rickle-msft,

I've retried this with the v11 SDK. The blob does exist - I verified that it exists both before and after I try the copy (both in code and manually via looking at the Azure portal). I am using the exact same containerURL with the same credentials both to create the original blob and to try to do the copy. I am still getting a 404 error. I cannot just modify the container to give public access. The code in question is used to manage very large and very sensitive customer information that must be stored in secure containers. And I don't see why I need to generate a SAS to copy the blob to itself - I am using the exact same credentials and the exact same containerURL - and by this I mean the exact same objects in both calls. I'm not even logging in a second time - it isn't a new session with the same credentials, it is the same SharedKeyCredentials instance.

You can see the code in question for yourself: https://github.com/mattvryan/jackrabbit-oak/blob/4cd665c2cf299db03e3b24c2716b28f741acd899/oak-blob-cloud-azure/src/main/java/org/apache/jackrabbit/oak/blob/cloud/azure/blobstorage/AzureBlobStoreBackend.java#L323 You can even clone my branch and run the tests and watch it fail. https://github.com/mattvryan/jackrabbit-oak/tree/OAK-8105

I appreciate you trying to help out here. And I'm trying to be patient - but I don't know if you can imagine just how frustrating this is, spending weeks of time trying to get this component that was already working fine to work with the new SDK.

rickle-msft commented 4 years ago

I don't know that I can say I grasp your level of frustration, but I understand that it has been frustrating, and I do apologize. I hope my previous reply did not come across as dismissive as my highest priority is ensuring the satisfaction of our customers. I also understand that there are environments in which a container cannot simply be made public for reasons you mentioned. I assumed you were working in a test enviornment, which is why I proposed that possibility as a first step.

Given the chronic nature of this issue and that it seems to have diverged from the main purpose of this thread, could you please email me directly and we can follow up with your issue that way (frley@microsoft.com)? I will take a look at the links you posted and see if I can sniff out what is going on.

rickle-msft commented 4 years ago

For everyone else here (@bechhansen @vamsikt @choronz @teeohhem @mattvryan), I would like to make you aware that we have read this feedback carefully and have taken the last few months to produce a preview of an updated version of this SDK. Here is a document that gives a more in depth explanation of everything involved, but a quick summary is that it will have a much cleaner interface both in the way of type names and discoverability, a number of convenience apis left out of v10/11 have been added back, sync and async operations are supported equally, issues causing crashes for long running applications have been addressed, etc. Before GA, we will also be generating samples that have side by side implementations using v8 and v12 for many scenarios, which will hopefully improve the upgrade story. We will be happy to add to that list of samples as you guys need.

We are currently recommending that customers either move back to v8 if they are comfortable there as v8 will be developed, supported, and maintained for a while longer. Alternatively, we suggest that you check out the preview of v12 if you are interested in preview versions. This will ultimately be the path forward in the long term. We are also shipping another version of this preview next week.

rickle-msft commented 4 years ago

Hi, all. I just wanted to give an update that we have released a GA version. As I mentioned, we spent a lot of time trying to incorporate the feedback here. One thing I will give a heads up on, though, is we still need to add the Sas generation methods to the clients instead of leaving them in separate types. That unfortunately didn't make it into GA, but we are planning to add it very soon.