Open recursivetree opened 2 months ago
Thanks! Some comments on reasonable_filename
:
..
from filenames?unique_suffix
to the filename mess up the extension?unique_suffix
.I'm cool with the filetype dependency, but maybe instead, wumpus_in_the_middle should be recording the content_type HTTP response header so that we don't have to guess. I haven't seen the re-encoding issue, but I assume that it happens when the content_type header of the image response doesn't match the image's file extension?
Why strip .. from filenames?
That is just my personal opinion, but file..png
looks really ugly. Also ..
on it's own refers to the parent directory if a file somehow ends up only consisting of just ..
-> unlikely, but potentially a security issue.
Wouldn't adding a unique_suffix to the filename mess up the extension?
That is true. Maybe it would be better to add it at the start
I think collisions are still technically possible if one filename happens to end in a hexadecimal number that will later get used as a unique_suffix
that is totally true, even when it doesn't end in a hexadecimal number. Currently we only add it when the filename on its own is too short or we cut it short. I think for short filenames it is obvious why adding entropy is a good idea. For too long filenames, my though was that maybe they have a very long prefix and we cut away the uniqueness.
Alternatively, we can always add the unique suffix.
I'm cool with the filetype dependency, but maybe instead, wumpus_in_the_middle should be recording the content_type HTTP response header so that we don't have to guess. I haven't seen the re-encoding issue, but I assume that it happens when the content_type header of the image response doesn't match the image's file extension?
This is worth considering, but how will we deal with old recordings where we don't have a mime type? As an archiving project, we probablly shouldn't just break compatibility with old recordings.
How I found the reencoding issue is that some programs couldn't open my .png file while others could. After some analysis, I noticed it is actually a webp image. I then reproduced it by sending a png in a channel and downloading it. However, I've also noticed it doesn't happen for every image.
This PR improves how the html exporter handles images.
OSError: [Errno 63] File name too long
by sanitizing file names derived from attachment idsreasonable_filename
function