developmentforpeople / dfp_external_storage

Simplest cloud file management for Frappe / ERPNext. S3 compatible external bucket can be assigned per Frappe folder, allowing you to fine-tune the location of your Frappe / ERPNext "File"s: within local filesystem or to exteral S3 bucket.
MIT License
28 stars 14 forks source link

streamed downloads #13

Closed khoran closed 8 months ago

khoran commented 8 months ago

I need to be able to download files larger than memory (potentially). I just realized recently that this is not currently happening. While looking at the file function that does the downloading, I noticed your comment "# TODO: For videos enable direct stream from S3". So it seems you've thought about this too.

I'm willing to implement something, as I need this feature right now. But I thought I'd ask you first if you had some ideas/preferences on how to implement this? Or should I just come up with something on my own?

Do you think it would be possible and/or a good idea, to download directly from S3? Either via a redirect response, or just have a function that returns an S3 URL that could be used however?

developmentforpeople commented 8 months ago

Hi @khoran I believe the way to go here is the presigned url, because in my use cases, all files must be safely private, and should not be downloadable for anyone, so it comes to my mind two possible options (not sure if more options out there!!!):

I believe the presigned url is the way to go :) Hope this helps Kevin!

khoran commented 8 months ago

Thanks for the info. I initially also liked the idea of streaming directly from S3, but then realized that it will not always be the case (and is not for me), that the S3 server is publicly accessible. We have an on-premise S3 server which I setup mainly because ERPNext would only talk to S3 servers for storing backups offsite. But it is not publicly accessible as our ERPNext instance is.

So I think a more general solution is to stream the file through Frappe. Fortunately, this isn't too hard to do, and with a 10MB buffer I can achieve 100MB/s locally.

The main change is this (in the file function):

from werkzeug.wsgi import wrap_file

...

#filecontent = doc.dfp_external_storage_download_file()
proxy = doc.dfp_external_storage_file_proxy()
filecontent= wrap_file(frappe.local.request.environ,proxy,10000000)

It turns out that the Response object can already take an iteratable of bytes to create a streamed response, and the werkzeug library has a helper function, wrap_file to turn a file-like object into a byte iterable. Then after the Response is created, we set the file size:

resp = Response(**response_values)
if proxy:
    resp.content_length = proxy.object_size
return resp

If you're good with this direction, the only remaining question I have is what to do about the caching? Is there still some value in caching something (like the doc), or just disable the caching?

developmentforpeople commented 8 months ago

Hi @khoran maybe in your setup you do not need to cache any file... I wrote the constant DFP_EXTERNAL_STORAGE_CACHE_PUBLIC_FILES_SMALLER_THAN_X_MB you can set to 0, but that setup should be done maybe within site config or bucket setup... not sure. And maybe add another one to allow your new implementation, so it would be a good idea being able to setup the cache and your new stream option per bucket. We need to update the Doctype to add these new two fields... some any other ideas?

khoran commented 8 months ago

What if we say for files less than DFP_EXTERNAL_STORAGE_CACHE_PUBLIC_FILES_SMALLER_THAN_X_MB, we cache them and not stream them, so basically everything stays as-is for those files. Then anything larger, we just always stream them and don't cache anything? Is there any downside to streaming files that would make a user want to disable it? I'm just wondering if we really need a user setting for that. Up to you of course though, I can implement it either way.

developmentforpeople commented 8 months ago

Yeah, you are right, maybe it is a good idea just streaming... right! go that way, sounds great! :) Thanks Kevin!

khoran commented 8 months ago

Ok, I'll send you a PR in a bit.