gabriel-vasile / mimetype

A fast Golang library for media type and file extension detection, based on magic numbers
https://pkg.go.dev/github.com/gabriel-vasile/mimetype#pkg-overview
MIT License
1.62k stars 160 forks source link

No matching lowercase `f4v ` type #562

Closed luhuaei closed 1 month ago

luhuaei commented 1 month ago

Attach the file for which the detection is inaccurate

https://github.com/user-attachments/assets/f23b3ba0-5bba-4076-894f-1c522bee1b09

Expected MIME type video/mp4

Returned MIME type application/octet-stream

Version of the library you are using v1.4.3

Output of go version go version go1.22.4 linux/amd64

Additional context I tried to solve this problem using the extend interface, as shown in the code below.

func init() {
    mimetype.Extend(ftyp([]byte("f4v ")), "video/mp4", ".mp4")
}

// ftyp creates a Detector which returns true if any of the FTYP signatures
// matches the raw input.
func ftyp(sigs ...[]byte) func([]byte, uint32) bool {
    return func(raw []byte, limit uint32) bool {
        if len(raw) < 12 {
            return false
        }
        if !bytes.Equal(raw[4:8], []byte("ftyp")) {
            return false
        }
        for _, s := range sigs {
            if bytes.Equal(raw[8:12], s) {
                return true
            }
        }
        return false
    }
}
gabriel-vasile commented 1 month ago

@luhuaei do you have any idea how this file was created? What program was used?

Linux file utility does not recognize it:

➜  file --mime 352909577-f23b3ba0-5bba-4076-894f-1c522bee1b09.mp4 
352909577-f23b3ba0-5bba-4076-894f-1c522bee1b09.mp4: application/octet-stream; charset=binary

And f4v is not registered with mp4 authority.

luhuaei commented 1 month ago

@luhuaei do you have any idea how this file was created? What program was used?

Linux file utility does not recognize it:

➜  file --mime 352909577-f23b3ba0-5bba-4076-894f-1c522bee1b09.mp4 
352909577-f23b3ba0-5bba-4076-894f-1c522bee1b09.mp4: application/octet-stream; charset=binary

And f4v is not registered with mp4 authority.

This video was provided to me as feedback. The information in the video can be viewed using exiftool, which correctly identifies the MIME type of this video.

This exiftool output

ExifTool Version Number         : 12.92
File Name                       : lowercase_f4v_mime_magic_cause_mime_detect_incorrectly.mp4
Directory                       : .
File Size                       : 3.1 MB
File Modification Date/Time     : 2024:07:29 10:07:01+08:00
File Access Date/Time           : 2024:08:01 16:40:15+08:00
File Inode Change Date/Time     : 2024:07:29 12:12:02+08:00
File Permissions                : -rw-r--r--
File Type                       : MP4
File Type Extension             : mp4
MIME Type                       : video/mp4
Major Brand                     : Unknown (f4v )
Minor Version                   : 0.0.0
Compatible Brands               : isom, mp42, m4v
Movie Header Version            : 0
Time Scale                      : 90000
Duration                        : 0:04:15
Preferred Rate                  : 1
Preferred Volume                : 100.00%
Preview Time                    : 0 s
Preview Duration                : 0 s
Poster Time                     : 0 s
Selection Time                  : 0 s
Selection Duration              : 0 s
Current Time                    : 0 s
Next Track ID                   : 4
Track Header Version            : 0
Track Create Date               : 2015:05:08 15:23:52
Track Modify Date               : 2015:05:08 15:23:52
Track ID                        : 1
Track Duration                  : 0:04:15
Track Layer                     : 0
Track Volume                    : 0.00%
Image Width                     : 1280
Image Height                    : 720
Graphics Mode                   : srcCopy
Op Color                        : 0 0 0
Compressor ID                   : avc1
Source Image Width              : 1280
Source Image Height             : 720
X Resolution                    : 72
Y Resolution                    : 72
Compressor Name                 : AVC Coding
Bit Depth                       : 24
Balance                         : 0
Audio Format                    : mp4a
Audio Channels                  : 2
Audio Bits Per Sample           : 16
Matrix Structure                : 1 0 0 0 1 0 0 0 1
Media Header Version            : 0
Media Create Date               : 2015:05:08 15:23:52
Media Modify Date               : 2015:05:08 15:23:52
Media Time Scale                : 90000
Media Duration                  : 0:04:15
Media Language Code             : eng
Handler Type                    : Data
Handler Description             : Timed Metadata Handler
Other Format                    : amf0
Warning                         : [minor] The ExtractEmbedded option may find more tags in the media data
Start Timecode                  : 00:00:19:10
XMP Toolkit                     : Adobe XMP Core 5.5-c014 79.151805, 2013/04/09-12:08:21
Create Date                     : 2015:05:08 23:23:52+08:00
Modify Date                     : 2015:05:09 23:29:27+08:00
Creator Tool                    : Adobe Premiere Pro CC (Windows)
Metadata Date                   : 2015:05:09 23:29:27+08:00
Video Frame Rate                : 25.000000
Video Field Order               : Progressive
Video Pixel Aspect Ratio        : 1
Audio Sample Rate               : 44100
Audio Sample Type               : Compressed
Audio Channel Type              : Stereo
Start Time Scale                : 25
Start Time Sample Size          : 1
Original Document ID            : xmp.did:974803bd-c72c-7a41-a4d7-37aeb15dc9a8
Instance ID                     : xmp.iid:3529a6be-afe1-8d47-a2c5-03fce304b743
Document ID                     : xmp.did:296a5944-e495-ef4e-bf7f-d0df856708e2
Format                          : F4V
Duration Value                  : 22917600
Duration Scale                  : 1.11111111111111e-05
Project Ref Type                : Movie
Video Frame Size W              : 1280
Video Frame Size H              : 720
Video Frame Size Unit           : pixel
Start Timecode Time Format      : 25 fps
Start Timecode Time Value       : 00:00:19:10
Alt Timecode Time Value         : 00:00:19:10
Alt Timecode Time Format        : 25 fps
Windows Atom Extension          : .prproj
Windows Atom Invocation Flags   : /L
Windows Atom Unc Project Path   : \\?\E:\2015.4科大实验中学\2015.5.9科大实验最后出片[第07稿].prproj
Mac Atom Application Code       : 1347449455
Mac Atom Invocation Apple Event : 1129468018
Ingredients Instance ID         : xmp.iid:2be632c4-8664-3649-8794-8c8665ac22b2, xmp.iid:35d6fcc6-f3fb-5b47-bca9-99b9c726f0fb, xmp.iid:35d6fcc6-f3fb-5b47-bca9-99b9c726f0fb, xmp.iid:431240e6-ad04-804a-846d-a3df744a16b3, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:507e4d52-7af1-db41-aac5-60594b9bc3b4, xmp.iid:523f4ac2-dd00-cb41-b3bb-b7894fe01716, xmp.iid:8069fdab-72d0-3f4a-98a2-1e1a53272bb0, xmp.iid:9fb8e603-ad03-984e-ac60-213f42882058, xmp.iid:a5cef023-85ae-104a-b954-641b6126c51c, xmp.iid:d0d8afec-660f-524e-8309-81ae59a8a8f8, xmp.iid:f2a8c72d-03c6-134a-91eb-bbe2679221eb, xmp.iid:f2a8c72d-03c6-134a-91eb-bbe2679221eb, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c, xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c
Ingredients Document ID         : 238c38b2-ac35-2244-822f-d05e0000003b, xmp.did:3ec9e927-ca4c-e84e-a8fa-4d05a78336e9, xmp.did:3ec9e927-ca4c-e84e-a8fa-4d05a78336e9, xmp.did:53deae14-d959-8b41-aee8-5912f5d1a60e, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:2e89e4a9-6138-554f-8c38-d0ae941eb047, xmp.did:9364169f-a951-ba49-adc2-bd6a8d3249a1, xmp.did:a80048fd-fd1c-154b-a047-de91c709d285, 4842f7c0-44fe-8c0a-6451-01fa0000003b, xmp.did:fa89822d-453e-b44d-bb82-294be1f186a0, d1f1a7d7-c6a3-2555-baa6-88d40000003b, xmp.did:df70d9b1-779b-4747-8d39-41a0c0758e93, xmp.did:df70d9b1-779b-4747-8d39-41a0c0758e93, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39, xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39
Ingredients From Part           : time:26051880960000f254016000000d1524096000000f254016000000, time:968248028160000f254016000000d3485099520000f254016000000, time:968248028160000f254016000000d3261565440000f254016000000, time:0, time:8646704640000f254016000000d863654400000f254016000000, time:9601804800000f254016000000d680762880000f254016000000, time:10790599680000f254016000000d325140480000f254016000000, time:11136061440000f254016000000d436907520000f254016000000, time:12202928640000f254016000000d426746880000f254016000000, time:4907589120000f254016000000d1940682240000f254016000000, time:0, time:0, time:10221603840000f254016000000d2103252480000f254016000000, time:0, time:50518702080000f254016000000d10211443200000f254016000000, time:915372057600000f254016000000d52865809920000f254016000000, time:968237867520000f254016000000d5070159360000f254016000000, time:9073451520000f254016000000d7640801280000f254016000000, time:16714252800000f254016000000d5496906240000f254016000000, time:22211159040000f254016000000d3820400640000f254016000000, time:30156779520000f254016000000d2641766400000f254016000000, time:26031559680000f254016000000d4125219840000f254016000000, time:32798545920000f254016000000d17588067840000f254016000000, time:365783040000f254016000000d5029516800000f254016000000, time:60736241664000f254016000000d512096255999f254016000000, time:61660859904000f254016000000d448084224000f254016000000, time:62549915903999f254016000000d426746880000f254016000000, time:63353622527999f254016000000d405409536000f254016000000, time:64143104255999f254016000000d376959744000f254016000000, time:50386613760000f254016000000d10343531520000f254016000000, time:64968148223997f254016000000d853493760000f254016000000, time:5395299840000f254016000000d3678151680000f254016000000
Ingredients To Part             : time:33276096000000f254016000000d1524096000000f254016000000, time:57793720320000f254016000000d3485099520000f254016000000, time:61278819840000f254016000000d3261565440000f254016000000, time:55964805120000f254016000000d1127831040000f254016000000, time:30542883840000f254016000000d863654400000f254016000000, time:31406538240000f254016000000d680762880000f254016000000, time:32087301120000f254016000000d325140480000f254016000000, time:32412441600000f254016000000d436907520000f254016000000, time:32849349120000f254016000000d426746880000f254016000000, time:35369187840000f254016000000d1940682240000f254016000000, time:54948741120000f254016000000d1016064000000f254016000000, time:54948741120000f254016000000d3098995200000f254016000000, time:14783731200000f254016000000d2103252480000f254016000000, time:57092636160000f254016000000d955100160000f254016000000, time:58098539520000f254016000000d10211443200000f254016000000, time:4927910400000f254016000000d52865809920000f254016000000, time:64540385280000f254016000000d5070159360000f254016000000, time:13635578880000f254016000000d7640801280000f254016000000, time:21276380160000f254016000000d5496906240000f254016000000, time:26773286400000f254016000000d3820400640000f254016000000, time:30593687040000f254016000000d2641766400000f254016000000, time:33235453440000f254016000000d4125219840000f254016000000, time:37360673280000f254016000000d17588067840000f254016000000, time:4927910400000f254016000000d5029516800000f254016000000, time:54948741120000f254016000000d731566080000f254016000000, time:55680307200000f254016000000d640120320000f254016000000, time:56320427520000f254016000000d609638400000f254016000000, time:56930065920000f254016000000d579156480000f254016000000, time:57509222400000f254016000000d538513920000f254016000000, time:58047736320000f254016000000d10343531520000f254016000000, time:68391267840000f254016000000d1219276800000f254016000000, time:9957427200000f254016000000d3678151680000f254016000000
Ingredients File Path           : 保安采访.wav, 梦追人-童声.wav, 梦追人-童声.wav, 02N58PICT8M_1024.jpg, 150423_06.WAV, 150423_06.WAV, 150423_06.WAV, 150423_06.WAV, 150423_06.WAV, 150423_06.WAV, 88s58PICMnt_1024.jpg, 科大实验logo (2).png, 主任采访.wav, chaokujianbianyuhuancaibizhi_469082_10.jpg, 校长采访.wav, 2015.5.8梦追人-童声.wav, 2015.5.8梦追人-童声.wav, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov, 15.4.27 HD 1080P.mov
Ingredients Mask Markers        : None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None
History Action                  : created, saved
History Instance ID             : xmp.iid:974803bd-c72c-7a41-a4d7-37aeb15dc9a8, xmp.iid:3529a6be-afe1-8d47-a2c5-03fce304b743
History When                    : 2015:05:09 23:23:52+08:00, 2015:05:09 23:29:27+08:00
History Software Agent          : Adobe Premiere Pro CC (Windows), Adobe Premiere Pro CC (Windows)
History Changed                 : /
Pantry Coding History           : A=PCM,F=44100,W=16,M=stereo,T=ZOOM Handy Recorder H4n
Pantry Identifier               : 060A2B340101010501010D431300000049A09549106705CE08004602029A9C65
Pantry Video Compressor         : AVC_3840_2160_H422IP@L51
Pantry Audio Compressor         : LPCM24
Pantry Camera Model             : Sony PXW-FS7
Pantry Model                    : PXW-FS7
Pantry Make                     : Sony
Pantry Serial Number            : 0040037
Pantry Creator Tool             : Adobe Adobe Media Encoder CC 2014 (Macintosh)
Pantry Format                   : Waveform Audio
Pantry History Parameters       : unknown modifications
Pantry Ingredients Instance ID  : xmp.iid:e2bf0276-16ae-e84f-9426-a557699a7455
Pantry Ingredients Document ID  : xmp.did:2d1db7d3-5fbc-8a4c-9ad7-b50ff86f8f83
Pantry Ingredients From Part    : time:52733721600000f254016000000d3495260160000f254016000000
Pantry Ingredients To Part      : time:58819944960000f254016000000d3495260160000f254016000000
Pantry Ingredients File Path    : Kokia - 梦追人.mp3
Pantry Ingredients Mask Markers : None
Pantry Derived From Instance ID : xmp.iid:aa0401d5-3a33-446a-8c6c-869dad4a0734
Pantry Derived From Document ID : xmp.did:aa0401d5-3a33-446a-8c6c-869dad4a0734
Pantry Derived From Original Document ID: xmp.did:aa0401d5-3a33-446a-8c6c-869dad4a0734
Pantry Windows Atom Extension   : .prproj
Pantry Windows Atom Invocation Flags: /L
Pantry Mac Atom Application Code: 1347449455
Pantry Mac Atom Invocation Apple Event: 1129468018
Pantry Mac Atom Posix Project Path: 2015.4.20.科大实验附中.剪辑第四稿[001]副本_副本.prproj
Pantry Project Ref Type         : Movie
Pantry Start Timecode Time Format: 25 fps
Pantry Start Timecode Time Value: 00:03:18:13
Pantry Artist                   : KOKIA
Pantry Album                    : ô×·ÈË¡¡ÖÁѧð^¸ßµÈѧУУ¸è
Pantry Track Number             : 1
Pantry Genre                    : pop
Pantry Part Of Compilation      : false
Pantry Title                    : ô×·ÈË
Pantry Audio Sample Rate        : 44100
Pantry Audio Sample Type        : 16-bit integer
Pantry Audio Channel Type       : Stereo
Pantry Originator               : Logic Pro X
Pantry Origination Date         : 2015-04-25
Pantry Origination Time         : 00:22:00
Pantry Time Reference           : 158918760
Pantry Version                  : 1
Pantry Umid                     : 0000000074E05FD1E04D006550298D5FFF7F0000E0D9E30400600000000000000000000046C42E01010000002004D274FF7F000080218D5FFF7F00004DB6278C
Pantry Create Date              : 2015:04:27 14:15:07Z
Pantry Modify Date              : 2015:04:27 19:40:14Z
Pantry Metadata Date            : 2015:04:28 03:40:15+08:00
Pantry Alt Tape Name            : A006_C038
Pantry Start Time Scale         : 25
Pantry Start Time Sample Size   : 1
Pantry Video Pixel Aspect Ratio : 1
Pantry Video Field Order        : Progressive
Pantry Video Alpha Mode         : None
Pantry Video Frame Rate         : 25.000000
Pantry Instance ID              : xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c
Pantry Document ID              : xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39
Pantry Original Document ID     : xmp.did:0c91bc9b-2a56-a648-a9d1-cb4fa7c19d39
Pantry Alt Timecode Time Value  : 00:00:01:11
Pantry Alt Timecode Time Format : 25 fps
Pantry Video Frame Size W       : 1920
Pantry Video Frame Size H       : 1080
Pantry Video Frame Size Unit    : pixel
Pantry Duration Value           : 6473
Pantry Duration Scale           : 0.04
Pantry History Action           : saved
Pantry History Instance ID      : xmp.iid:f9d1bbc6-bac7-7149-a3b8-e681af92ee3c
Pantry History When             : 2015:04:28 03:40:15+08:00
Pantry History Software Agent   : Adobe Premiere Pro CC (Windows)
Pantry History Changed          : /metadata
Media Data Size                 : 145000262
Media Data Offset               : 331708
Image Size                      : 1280x720
Megapixels                      : 0.922
Avg Bitrate                     : 4.56 Mbps
Rotation                        : 0
gabriel-vasile commented 1 month ago

I see... exiftool defaults to video/mp4 for unknown ftyp. https://github.com/exiftool/exiftool/blob/6700bc3f95ccf8ad36dcf791c196fc6090b7db4d/lib/Image/ExifTool/QuickTime.pm#L101-L126

From my tests:

I run detection with https://github.com/digital-preservation/droid and https://github.com/google/magika and they both returned video/mp4, so it seems that most programs are defaulting on video/mp4 for this kind of files. Might be good for mimetype to do the same..

luhuaei commented 1 month ago

I see... exiftool defaults to video/mp4 for unknown ftyp. https://github.com/exiftool/exiftool/blob/6700bc3f95ccf8ad36dcf791c196fc6090b7db4d/lib/Image/ExifTool/QuickTime.pm#L101-L126

From my tests:

  • firefox refuses to play the file (says unsupported file format but other mp4 files play fine)
  • chrome plays but only the first 4 seconds
  • vlc also plays the first 4 seconds, so there is something weird with the file.

I run detection with https://github.com/digital-preservation/droid and https://github.com/google/magika and they both returned video/mp4, so it seems that most programs are defaulting on video/mp4 for this kind of files. Might be good for mimetype to do the same..

This 4-second clip is trimmed because the original video file is quite large, so I processed the video before uploading it to GitHub.

Could it be that other libraries also determine the mimetype based on the file extension backend?