SoftInstigate / restheart

Rapid API Development with MongoDB
https://restheart.org
GNU Affero General Public License v3.0
807 stars 171 forks source link

Implement asset update #303

Closed hannomalie closed 6 years ago

hannomalie commented 6 years ago

Hey guys,

thank you for your nice project :)

we have a lot of trouble because of the missing update-file functionality. If an asset already exists, putting the asset again results in 501 NOT IMPLEMENTED. We have two scenarios where this is a problem:

Current Behavior

  1. In the past, we had a single threaded producer where we implemented a GET and conditional DELETE before each PUT. Our setup changed and we now have to handle multiple producers in the same application, which means our solution is no longer valid. We added synchronization on the "asset url" here, so this is pretty much solved. When we have multiple applications however, this method doesn't work. We'd like to have an eTag-based solution here that would work for multiple applications, but having to use DELETE in between makes this impossible.

  2. We have to handle really large files (for example up to 700MB). We experience socket read timeout exceptions in the client here. We have a retry mechanism for every request, that performs the same request again. Despite the exception, the asset seems to be created... so GET for the asset and its binary url returns 202. We do a DELETE as usual. No other client does anything. The asset url as well as the binary url returns 404. The next PUT however, results in 501.

  3. The second scenario could as well occur with multiple applications.

Expected Behavior

  1. This scenario should be solved, when eTag header with a PUT request is used here. If the eTag doesn't match, the request should fail. Otherwise, the resource can internally be recreated with the latest data.
  2. For deleted assets (assets that return 404 on asset and binary url), PUT should not produce 501 responses.
  3. Multiple applications should be able to use eTag based PUT as well, because while uploading, another application could have started uploading/uploaded the same asset.

Context

We know that the missing put semantic is because of GridFS lacks this feature. However, we think restheart could implement PUT as an internal delete and create somehow. Our expectation is, that we can use put for files in the way it can be used for other documents.

Environment

We're using Linux and Windows, Restheart 3.3.1 (but already tested newer versions), MongoDB version 3.4.10

Steps to Reproduce

Our project setup is too complex to use it here, but my guess is that the following steps would suffice:

  1. Put an asset
  2. Put it again
  3. Result is 501

  4. Put a large asset (for example 700MB)
  5. Delete it and put it again immediately
  6. Results in 501

Possible Implementation

Here https://github.com/SoftInstigate/restheart/blob/0d87f8bd9b4b40167795fa6201f417561c61d49f/src/main/java/org/restheart/handlers/files/PutFileHandler.java should be a different logic in the exception handler in line ~100. This could be a possible solution: https://stackoverflow.com/questions/41223691/how-to-overwrite-image-in-mongodb-gridfs

Please tell me what you think about this and if you need any further information :) It would be interesting to know if you already investigated in this direction and if you prefer implementing it by yourselves or if you appreciate a pull request for this.

Regards, Hannes

mkjsix commented 6 years ago

Thank you for your report. We are aware of those limitations and maybe we could start removing some of them. To be honest, most of our team is on vacation this week, so we’ll have a look as soon as possible 😁

mkjsix commented 6 years ago

We had a chat about this. The original idea of the whole RESTHeart product has always been to stay as close as possibile to MongoDB functionalities. Now, as you wrote, GridFS does not implement any update semantic because this would involve long-term, non-atomic operations, which are hard to manage correctly:

In GridFS you are not removing/deleting a single document but actually a bunch of documents (files are split into chunks and each chunk is a separate document). That means replacing a file is simply not possible in an atomic manner.

GridFS is kind of a hackish feature. It is often better to just use a separate fileserver with a real filesystem to store the file content and only store the metadata in MongoDB.

Rif: https://stackoverflow.com/a/30074812/615095

We tend to agree with the above point of view. Adding file management transactionality in RESTHeart which alters GridFS semantics could represent for most users a departure from the idea that RESTHEart should copy MongoDB functionalities as much as possibile and could create an additional layer of complexity.

For example, now MongoDB 4.0 is adding ACID transactions, so in the future we will evaluate how to handle those from RESTHeart, but this will happen strictly within the boundaries of what the official Java driver permits.

We'd be more keen to evaluate the possibility to implement a module to interact with a real filesystem instead, for example by adding a configuration which points RESTHeart to a specific mounted path and use it for such purposes. It shouldn't be too difficult to use plain Java APIs to handle that. This way one could decide if needs GridFS (files handled by MongoDB, but with the mentioned limitations) or any "real" external fileserver mounted as a local filesystem. That is just an example, at the moment we don't have existing use cases for that.

hannomalie commented 6 years ago

Hi :)

Thank you for your effort, perfectly valid points that we can understand completely. On the other hand, we see restheart not as a mongodb facade - but also as an http facade, or as you say for yourself a restful web api. It hides the data storage from our users, or at least provides web operations for it. Don't you think that many of your customers think of it the same way? We're happy that documents and assets are treated nearly the same overall... that's why we (and our customers) expect similar/same behavior and support for HTTP verbs here. Implementing http verbs should not be limited by the driver or the database - and in fact, it isn't. The database or the driver just doesn't allow the nicest implementation, but deleting and recreating an asset should be just fine, because it is an implementation detail.

To implement our (i think very common) use case, we would have to build up a facade around restheart that handles assets. This is very complex and a lot of effort. I created a pull request that shows how an implementation could roughly look like: https://github.com/SoftInstigate/restheart/pull/308 . First tests with heavy load have been successful. I hope that you guys reconsider implementing this feature :)

mkjsix commented 6 years ago

Hi @hannespernpeintner

We're of course open to discuss this. I added a comment to your PR. It's an incremental feature, so it should not create any regression. It would help a lot if you could also contribute some integration tests. You can have a look at how we write ours.

hannomalie commented 6 years ago

Thanks! I'll do that as soon as I can :)

lenalebt commented 6 years ago

The respective PR has been merged, can be closed from my point of view.

mkjsix commented 6 years ago

Hi @hannespernpeintner, @lenalebt I'm closing this as the requirement has been implemented and will be released in version 3.5.

Thank you both for your contribution!

Feel free to re-open in case.