Roachbones / discordless

Record Discord traffic via mitmproxy and export chatlogs to JSON or HTML.
GNU General Public License v3.0
71 stars 5 forks source link

Improve image handling in html exporter #18

Open recursivetree opened 2 months ago

recursivetree commented 2 months ago

This PR improves how the html exporter handles images.

Roachbones commented 2 months ago

Thanks! Some comments on reasonable_filename:

I'm cool with the filetype dependency, but maybe instead, wumpus_in_the_middle should be recording the content_type HTTP response header so that we don't have to guess. I haven't seen the re-encoding issue, but I assume that it happens when the content_type header of the image response doesn't match the image's file extension?

recursivetree commented 2 months ago

Why strip .. from filenames?

That is just my personal opinion, but file..png looks really ugly. Also .. on it's own refers to the parent directory if a file somehow ends up only consisting of just .. -> unlikely, but potentially a security issue.

Wouldn't adding a unique_suffix to the filename mess up the extension?

That is true. Maybe it would be better to add it at the start

I think collisions are still technically possible if one filename happens to end in a hexadecimal number that will later get used as a unique_suffix

that is totally true, even when it doesn't end in a hexadecimal number. Currently we only add it when the filename on its own is too short or we cut it short. I think for short filenames it is obvious why adding entropy is a good idea. For too long filenames, my though was that maybe they have a very long prefix and we cut away the uniqueness.

Alternatively, we can always add the unique suffix.

I'm cool with the filetype dependency, but maybe instead, wumpus_in_the_middle should be recording the content_type HTTP response header so that we don't have to guess. I haven't seen the re-encoding issue, but I assume that it happens when the content_type header of the image response doesn't match the image's file extension?

This is worth considering, but how will we deal with old recordings where we don't have a mime type? As an archiving project, we probablly shouldn't just break compatibility with old recordings.

How I found the reencoding issue is that some programs couldn't open my .png file while others could. After some analysis, I noticed it is actually a webp image. I then reproduced it by sending a png in a channel and downloading it. However, I've also noticed it doesn't happen for every image.