bakape / thumbnailer

Go media thumbnailer
MIT License
153 stars 36 forks source link

Improve MP4 mime detection #20

Closed Kagami closed 6 years ago

Kagami commented 6 years ago

Not sure this is the best fix, I don't know about MP4 container much.

I commited problem file so you can decide by yourself.

bakape commented 6 years ago

I'll need to look into what excatly this fixes. MP4 is a very dumb and obfuscated format.

Kagami commented 6 years ago

what excatly this fixes

Basically:

$ head -c 48 testdata/no_sound.mp4 | hd
00000000  00 00 00 20 66 74 79 70  69 73 6f 6d 00 00 02 00  |... ftypisom....|
00000010  69 73 6f 6d 69 73 6f 32  61 76 63 31 6d 70 34 31  |isomiso2avc1mp41|
00000020  00 00 00 08 66 72 65 65  00 01 29 2f 6d 64 61 74  |....free..)/mdat|
00000030
$ head -c 48 testdata/rare_brand.mp4 | hd
00000000  00 00 00 24 66 74 79 70  69 73 6f 35 00 00 00 01  |...$ftypiso5....|
00000010  61 76 63 31 69 73 6f 35  64 73 6d 73 6d 73 69 78  |avc1iso5dsmsmsix|
00000020  64 61 73 68 00 00 00 08  66 72 65 65 00 00 00 3a  |dash....free...:|
00000030

Brand in attached file doesn't contain "mp4" bytes. Though ffmpeg detects it just fine.

bakape commented 6 years ago

I see. Looks like this is another dumb mp4 offshoot format https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_over_HTTP . Probably would be more reliable to pass to ffmpeg (as a last resort) for detection, instead of special-casing detection of this and any other "kind of MP4 but not really" container formats. Thoughts?

Kagami commented 6 years ago

I'm not a fan of ffmpeg detect because of issue described in #7. I think it would be better if we do unreilable mime checks but pass exact container/decoder to ffmpeg. That way if we chose the wrong format ffmpeg will fail anyway.

Another option is to limit allowed formats in ffmpeg. I think it should be possible with register*-like functions? Seems like currently all possible formats are accepted while thumbnailer supports only most basic ones. Also we won't need mime sniff in that case: we can just pass everything to ffmpeg. Maybe still useful in order to fail fast on bad files passed by users though.

One more option is to run thumbnailer process separately from application inside e.g. container with disabled networking and sandboxed filesystem. This is good as last resort defense, but in general thumbnailer shouldn't do network requests. There was similar security vulnerability with ImageMagick recently btw. It accepted too many obscrube formats and some of them allowed code execution with specially constructed files. ffmpeg's issue is very similar: it tries to parse tons of possible formats resulting in potentially bad decisions from security point of view.

bakape commented 6 years ago

Maybe still useful in order to fail fast on bad files passed by users though.

That would complicate the ability of library users detect formats without performing thumbnailing.

Fair points otherwise.

Kagami commented 6 years ago

What option (relaxed mimes + pass exact formats/decoders to ffmpeg vs detect everything with ffmpeg) do you like more? I will try to implement it if I have time.

bakape commented 6 years ago

Detecting everything with ffmpeg is not an option, as that would break the extension APIs we have now: https://godoc.org/github.com/bakape/thumbnailer#RegisterMatcher https://godoc.org/github.com/bakape/thumbnailer#RegisterProcessor

Kagami commented 6 years ago

Matcher takes up to the first 512 bytes of a file and returns the MIME type

Hm, that won't work probably, yeah. Though that mime check is unreilable anyway, you need to process file fully in order to be sure that it's really valid.

bakape commented 6 years ago

you need to process file fully in order to be sure that it's really valid.

Which is quite expensive, if you just want to detect the MIME type.

On 10 April 2018 at 22:52, Kagami Hiiragi notifications@github.com wrote:

Matcher takes up to the first 512 bytes of a file and returns the MIME type Hm, that won't work probably, yeah. Though that mime check is unreilable anyway, you need to process file fully in order to be sure that it's really valid.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bakape/thumbnailer/pull/20#issuecomment-380225932, or mute the thread https://github.com/notifications/unsubscribe-auth/AHfPsF2Vdi8RHo6r4TNNxGgnQBLaf0pdks5tnQ2TgaJpZM4TLhBJ .