gabriel-vasile / mimetype

A fast Golang library for media type and file extension detection, based on magic numbers
https://pkg.go.dev/github.com/gabriel-vasile/mimetype#pkg-overview
MIT License
1.62k stars 160 forks source link

Tar format not detected. #464

Closed dstruck closed 8 months ago

dstruck commented 9 months ago

Concerned file: https://github.com/danielmiessler/SecLists/raw/master/Payloads/Zip-Bombs/r.tar.gz

Expected MIME type: application/x-tar

Returned MIME type: application/octet-stream

Version of the library: v1.4.3

The outer layer is correctly detected as application/gzip. After unpacking the outer layer (gunzip r.tar.gz), r.tar is detected as application/octet-stream, although it can be uncompressed with the GNU tar utility (tar xvf r.tar; Ubuntu).

The file utility on Ubuntu detects r.tar as r.tar: tar archive.

Tika is also able to detect the file is a Tar archive. If I understand it correctly, Tika uses "org.apache.commons.compress.archivers.tar.TarArchiveInputStream" to detect if it is a Tar archiv. This project documented the different possible Tar headers here: compress/blob/master/src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveEntry.java:

Another project able to detect the file as a Tar archive: https://github.com/trailofbits/polyfile:

A pure Python cleanroom implementation of libmagic, with instrumented parsing from Kaitai struct and an interactive hex viewer

One can find their Kaitai definition for the Tar format here: https://github.com/trailofbits/polyfile/blob/master/polyfile/magic_defs/archive

gabriel-vasile commented 8 months ago

Thank you for the detailed issue. Fixed.