cookiecutter / cookiecutter-django

Cookiecutter Django is a framework for jumpstarting production-ready Django projects quickly.
https://cookiecutter-django.readthedocs.io
BSD 3-Clause "New" or "Revised" License
11.76k stars 2.83k forks source link

Access-controlled service for large static files #2099

Open senderle opened 5 years ago

senderle commented 5 years ago

Description

I'm proposing that a feature be added to serve large static files to authenticated users.

It might not be obvious why this is a problem. Here are some of the possible solution paths, and why they are blocked:

1) Can't we use a static file service like whitenoise?

2) Can't Django just serve the files through a FileResponse object?

3) Isn't there some kind of funky thing you can do with headers?

  1. Could you use AWS somehow?

    • Maybe. I haven't looked into this option carefully. But it seems like it would be very complicated to get right.

How should it be implemented? I don't know. This is where I am stuck, and would welcome discussion. I posted a question on stack overflow and got crickets; if you see a way around this that doesn't require a pull request, please feel free to answer there.

Rationale

In a sense, this is not a "feature" but a fix. The change from Caddy to Traefik arguably broke functionality that was working pretty well before.

What it really means for me, concretely, is this: now that I want to do something similar with a new app, I can't use cookiecutter-django without a fairly elaborate and awkward reconfiguration -- something like standing up an nginx container between the django service and the traefik service. If that's the only option, my instinct is to not use cookiecutter-django at all. I probably don't need all the things, and the configuration work will wind up being about the same either way. And maybe that's fine; this could just be a "It might not be what you want" situation.

But I'm proposing the alternative narrative that this would actually fix something that worked before and now is broken. I don't honestly imagine that there are that many people doing what I'm doing, and so I can't argue that you will lose a bunch of users over this. It's just kind of annoying that it used to be easy, and now is hard.

Use case(s) / visualization(s)

Here's my use case: I am developing new apps for researchers at the University of Pennsylvania doing large-scale statistical text analysis in multiple different departments. I need to be able to automatically distribute copyright-protected data to authorized users in bulk, without risking leaking the data.

browniebroke commented 5 years ago

I need to be able to automatically distribute copyright-protected data to authorized users in bulk, without risking leaking the data.

Are you positive these should be distributed using static files and not media files instead? It sounds like this data would be uploaded by your application users to a FileField or ImageField rather than tracked in version control like your code base. Django-storage is providing some options to restrict access.

senderle commented 5 years ago

The data is generated by a crawling process and aggregated into large zip files that the user then downloads. There's no uploading involved. (It's also not tracked in version control.)

But if there are ways to restrict access to the files using some other mechanism that I haven't mentioned above, I'm all ears! It just has to be able to efficiently handle multi-gigabyte files.

browniebroke commented 5 years ago

The data is generated by a crawling process and aggregated into large zip files that the user then downloads. There's no uploading involved.

Ok, so when I have to do that type of things, for me, there is an "upload" invloved at some point, not from a user, but from the crawling process. Here is how I usually handle this (assuming I'm on Docker based config):

Each time a user wants to download a file, your application exposes the LargeZipFile.zip.url on some page, which will have query parameters giving access for a short amount of time.

That being said, I don't know how your server is deployed at the University of Pennsylvania, it might be on a dedicated, non-cloud server. I don't know which are your storages options, but if AWS is not suitable, Digital Ocean might be and has a compatible API, which is supported by django-storages.

It just has to be able to efficiently handle multi-gigabyte files.

A word of warning that it could generate some significant costs from Amazon.