Open amitupreti opened 1 year ago
To the contrary, we publish a lot of software which is executable.
There is no reason you should be less frightened by an executable Python script than by a DOS batch file.
We do not want to publish compiled code because it cannot be inspected or modified and it bit-rots. That's a matter of policy, though. It's also quite impossible to reliably recognize compiled code in all the many forms it may take.
Editors are responsible for carefully handling untrusted files. That includes not downloading random untrusted .exe files and running them. It also includes not downloading random untrusted .py files and running them.
The platform cannot prevent editors from doing dumb things, and if we try to pretend it can, we create a dangerous false sense of security.
Yeah, this was raised by a security audit and I'm not sure how best to address it. Allowing executable files to be uploaded carries risk (obviously), but it seems to be a part of the game when running a publishing platform. Their immediate suggestion was to mitigate it by blocking certain file types, which doesn't fit well with the use of the platform.
Is it possible to reduce the risk in other ways? My thoughts are:
The days of people downloading exe files and having them run automatically are long past. As far as I'm aware, every Windows browser sets the appropriate magic flag to ensure that Windows will warn you before letting you execute the file. This feature is so old that it's hard to find search results about it - I think the feature is called "zone identifiers" and might have been added as recently as Windows XP SP2 (2004).
Running clamav isn't necessarily a bad idea. It doesn't really mean anything, though, in terms of protection from malicious uploaders, and I'd also be a bit concerned about the time it would take to scan a multi-TB database.
That said, I've long wanted a framework for automatically running checks on file content. Checking known file formats, detecting obvious PHI, checking that admins didn't break the permissions... the list goes on. Virus scanning could be part of that too.
What I would imagine would be to have additional states for unpublished projects, e.g.:
There would be some automated process (using background_tasks
, maybe) that would run the checks and advance the project to the next stage, or else report errors and kick it back to the previous stage.
Problem: Our current setup allows authors to upload any types of files in the project, and these files can be downloaded without any warnings by other users like reviewers/editors/authors which can be dangerous in case the project author has submitted some kind of executable file that can make changes to system.
Solution/Discussion: So it seems we might have to think of a way to modify data upload process to reduce risks, or at least show warnings to users when they download certain types of files.
Some points from discussion with Tom
Implementation/Ideas: 1, We could update the view
generate_signed_url
and create a separate function sayanalyze_file
, that does all kinds of checks on files. So in future we could just keep adding checks/logic on the separate function, which is used by the upload viewgenerate_signed_url
and probably use django messages to throw errorsFor showing erros maybe we can have a static JSON files with error codes, and use that with the output of
analyze_file
to throw appropriate errors to the uploader