Closed khoran closed 8 months ago
Hi @khoran I believe the way to go here is the presigned url, because in my use cases, all files must be safely private, and should not be downloadable for anyone, so it comes to my mind two possible options (not sure if more options out there!!!):
I believe the presigned url is the way to go :) Hope this helps Kevin!
Thanks for the info. I initially also liked the idea of streaming directly from S3, but then realized that it will not always be the case (and is not for me), that the S3 server is publicly accessible. We have an on-premise S3 server which I setup mainly because ERPNext would only talk to S3 servers for storing backups offsite. But it is not publicly accessible as our ERPNext instance is.
So I think a more general solution is to stream the file through Frappe. Fortunately, this isn't too hard to do, and with a 10MB buffer I can achieve 100MB/s locally.
The main change is this (in the file
function):
from werkzeug.wsgi import wrap_file
...
#filecontent = doc.dfp_external_storage_download_file()
proxy = doc.dfp_external_storage_file_proxy()
filecontent= wrap_file(frappe.local.request.environ,proxy,10000000)
It turns out that the Response
object can already take an iteratable of bytes to create a streamed response, and the werkzeug library has a helper function, wrap_file
to turn a file-like object into a byte iterable. Then after the Response
is created, we set the file size:
resp = Response(**response_values)
if proxy:
resp.content_length = proxy.object_size
return resp
If you're good with this direction, the only remaining question I have is what to do about the caching? Is there still some value in caching something (like the doc), or just disable the caching?
Hi @khoran maybe in your setup you do not need to cache any file... I wrote the constant DFP_EXTERNAL_STORAGE_CACHE_PUBLIC_FILES_SMALLER_THAN_X_MB you can set to 0, but that setup should be done maybe within site config or bucket setup... not sure. And maybe add another one to allow your new implementation, so it would be a good idea being able to setup the cache and your new stream option per bucket. We need to update the Doctype to add these new two fields... some any other ideas?
What if we say for files less than DFP_EXTERNAL_STORAGE_CACHE_PUBLIC_FILES_SMALLER_THAN_X_MB, we cache them and not stream them, so basically everything stays as-is for those files. Then anything larger, we just always stream them and don't cache anything? Is there any downside to streaming files that would make a user want to disable it? I'm just wondering if we really need a user setting for that. Up to you of course though, I can implement it either way.
Yeah, you are right, maybe it is a good idea just streaming... right! go that way, sounds great! :) Thanks Kevin!
Ok, I'll send you a PR in a bit.
I need to be able to download files larger than memory (potentially). I just realized recently that this is not currently happening. While looking at the
file
function that does the downloading, I noticed your comment "# TODO: For videos enable direct stream from S3". So it seems you've thought about this too.I'm willing to implement something, as I need this feature right now. But I thought I'd ask you first if you had some ideas/preferences on how to implement this? Or should I just come up with something on my own?
Do you think it would be possible and/or a good idea, to download directly from S3? Either via a redirect response, or just have a function that returns an S3 URL that could be used however?