cs3org / wopiserver

A vendor-neutral application gateway compatible with the WOPI specifications.
Apache License 2.0
52 stars 27 forks source link

fix: stream file content from the incoming request over to the storage interfaces #138

Closed DeepDiver1975 closed 8 months ago

DeepDiver1975 commented 9 months ago

refs #134

@glpatcern please have a look - thx

glpatcern commented 9 months ago

Thanks Thomas, appreciated. However, as it is we can't merge it: the (expected) change in the writefile() signature must be reflected in all storage interfaces, in particular in the xroot one that we still use in production. And in addition, we'd need Reva to support a stream by chunking it in multiple HTTP PUT requests, right?

But before going further, I challenged in #134 the usefulness of such patch. Unless proven wrong, we won't gain anything from it.

DeepDiver1975 commented 9 months ago

However, as it is we can't merge it: the (expected) change in the writefile() signature must be reflected in all storage interfaces,

This will for sure follow as soon as we agree on an approach ;-)

This code change will help without any other change - simply because it eliminates to copy the data from the incoming request body to RAM before writing it into the outgoing request.

flask.request.get_data is reading all bytes from the underlying stream to the variable rv - this is where the data is copied unnecessarily.

https://github.com/pallets/werkzeug/blob/eafbed0ce2a6bdf60e62de82bf4a8365188ac334/src/werkzeug/wrappers/request.py#L420

Just to make sure we are on the same page - writing a file is a POST and as per definition the full file will be transmitted to the server before (in contrary to PUT). Whenever I am referring to " streaming" in this context the meaning is "stream data in the code".

Hope this helps to clarify. THX

glpatcern commented 9 months ago

This will for sure follow as soon as we agree on an approach ;-)

OK sure - see below!

Just to make sure we are on the same page - writing a file is a POST and as per definition the full file will be transmitted to the server before (in contrary to PUT). Whenever I am referring to " streaming" in this context the meaning is "stream data in the code".

Right, we are on the same page then, the request.get_data() buffer contains the full file by the time we start processing the POST request. Now, whether the request.stream file-like object avoids a memory copy, I don't actually know - I assumed that the buffer (my content parameter) would be passed as an implicit reference to the underlying call, not as a copy.

So I'd say before agreeing on an approach and defining the signature for writefile(), could this hypothesis be tested? This is a question to @wkloucek ;-) could you compare the memory consumption on large files with and without this PR?

glpatcern commented 8 months ago

Superseded by #141