These are IFF based files, all start with 0x464f524d/FORM
AIFF files are a hodgepodge of formats and specs all thrown under the same label, different compression styles or similar compression styles with the wrong FourCC can render a file unplayable on certain software.
I've updated/tidied the database to recognise the additional AIFF or AIFC header at byte 8. With possible enhancements under V2 we could perform further matches to detail compression used and possible even bitrates etc...
.aif I've removed the trailing 00 to allow it access to the multi-part section for better confidences
.aiff @cdgriffith had already added AIFF and 8SVX at byte 8 in multi-part, corrected MIME for AIFF
Removed ["41494646", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"], ["38535658", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"] and ["41494643", 8, ".aiffc", "audio/x-aifc", "AIFC audio"] as these are covered by other matches in the 0x464f524d/FORM match format.
.au:
The existing fingerprint should match all files
No changes, we could extract more info but looking at how sndhdr does it I'll leave that for a V2 upgrade
.hcom:
There exists almost no information on the format, what there is, is basically the same data as linked below in differing formats. From what I can see it's some old Apple Mac format possibly used in apps and games.
The sndhdr test looks for two headers, one is in the Mac header, the other in the Mac data fork. For the time being I have added them as two separate tests, this will give a low-ish confidence score, however, in the absence of test files there is little more I can do.
If anyone ever reads this and has some sample files, I'll take a look to improve this match.
.sndt:
After a lot of digging I found this format seems to belong to a very old Win 3.1 era program called SoundTool/SNDTOOL, I managed to source a copy buried in a shareware .iso at archive.org. Downloading it and comparing a sample file included to the ones below seems to indicate this is the source of these files.
.voc/.wav:
No changes required, existing fingerprint will match any VOC/WAV file.
V2 Improvements could look to decode audio data for sample rate etc...
.sb/.ub/.ulaw:
Cannot add, .sb and .ub are intended to be signed or unsigned byte-streams as far as I can guess the intentions of the sndhdr authors. This means they are simply a stream of bytes that hold audio data, knowledge of the correct bitrate etc.. then decodes them back to audio.
.ulaw is essentially a CODEC used in various audio containers such as AIFF and AU, this again means there is no specific ulaw file format.
In these cases, there is not a lot we can do to detect these files. It would basically require creating an audio decoder similar to sndhdr or Audacity, VLC etc. to fully process and try to understand these files. This could be possible with V2 but this would take on a life of its own.
.sndr:
I have no idea on this, I've added the header match from sndhdr but again without test files or knowledge of the program they came from we can't go any better than that.
Again, if anyone reading this has any test files of the program that made it, I'll take a look and improve.
Other formats:
Honestly this PR is a bit of a so so one, so let's add some extras stuff to make it more exciting.
.vhdx:
The updated version of the older .vhd format used by Microsoft Hyper-V and Virtual PC, nice simple header of 0x7668647866696c65 / vhdxfile.
.qcow/.qcow2/.qed:
QEMU's Hard drive image formats. Simple headers with version numbers
0x514649fb00000001 / QFIû for QCOW Image
0x514649fb00000002 / QFIû for QCOW2
0x514649fb00000003 / QFIû for QCOW3 (Still .qcow2 extension)
0x514544 / QED for QEMU Enhanced Disk Image
.luks
Linux Unified Key Setup is another HD Image format, there are two versions LUKS1 and LUKS2.
0x4c554b53babe0001 / LUKSº¾ for LUKS1
0x4c554b53babe0002 / LUKSº¾ for LUKS2
It's an interesting format that has an embedded .json, future V2 functionality could interrogate the files to display encryption type and other data.
.vdi
Sun/Oracle HD Image for use with VirtualBox. Nice long headers to match against. There is no official document on the format it seems but a good breakdown is available, linked below.
0x3c3c3c2053756e2078564d205669727475616c426f78204469736b20496d616765203e3e3e / <<< Sun xVM VirtualBox Disk Image >>> for older Sun images
0x3c3c3c204f7261636c6520564d205669727475616c426f78204469736b20496d616765203e3e3e / <<< Oracle VM VirtualBox Disk Image >>> for newer Oracle images
As far as I can see there is only one version (1.1) with the same image signature starting at byte 64 for both flavours, I've included it as a multi-part for completeness.
.vmdk
There are already entries in the .json for VMWare .vmdk files, I have tidied and adjusted some to better match real world files
["4b444d", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"] Removed as the correct match is below it in the .json
["23204469736b2044657363726970746f", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"] Corrected to include better match using the full term # Disk DescriptorFile and changed the name to VMware Image Descriptor File
["23204469736b2044", 0, ".vmdk", "application/octet-stream", "VMware Virtual Disk description"] Removed as above fix is better match
Removed 3 and 4 from the 434f5744/COWD and 4b444d56/KDMV labels, these files do different jobs, they are not for different versions of VMWare
.dmg
The venerable archive format of Mac OS machines, the existing entry would only ever work for the file it came from. The correct way to identify a .dmg is to use a footer match at -512 for koly.
["7801730d626260", 0, ".dmg", "application/octet-stream", "MacOS X image file"] and ["", 0, ".dmg", "application/octet-stream", "MacOS X image file"] removed, new entry in footer added
OK, Even more formats
I'll note here the CD/DVD images are a real pain in the backside, lots of overlapping headers and proprietary info. This is a good start for later V2 fun.
MagicISO Image Format .uif
A seemingly much hated proprietary format for storing images of CD/DVD's. Can't find any test files or documentation, however, there is UIF2ISO which converts the files to regular ISO. Digging in the source seems to show a header at byte 0 of 0x73696262 / sibb with another match at byte 8 of 0x72686c62 / rhlb if it's encrypted.
If I ever come across a real file to test against I'll confirm this but the code has been around a long time so it's pretty safe to assume it's correct.
PowerISO Direct Access Archive .daa
Another proprietary format for storing images of CD/DVD's, much like .uif it's also pretty unpopular. The author of UIF2ISO also created a tool to deal with them called DAA2ISO.
Simple header of 0x444141 / DAA at byte 0
gBurner Image .gbi
Another proprietary format for storing images of CD/DVD's, it appears to be quite similar to .daa as DAA2ISO handles both.
Simple header of 0x474249 / GBI at byte 0
Apple HyperCard Stack .hc
While I was looking for data on another .hc extension, HyperCards popped up, so we'll add them in while we're here. HyperCards were almost a pre-cursor to web pages, able to store text and images in a clickable, searchable database. Header of 0x5354414b / STAK at byte 4
VeraCrypt File Container .hc
An encrypted image container, we can only add this as an extension as the VERA header at byte 64 and all data following is encrypted by the 64 byte salt.
Nero Disc images *.nrg
Nero was once one of the most popular CD/DVD burning tools, the .nrg was their own custom image format. These use Footer matches for the two versions 0x4e45524f / NERO at -8 and 0x4e455235 / NER5 at -12 for v1 and v2 images.
Compressed ISO images .isz
Created by EZB Systems for use in their various products, this is an open specification for producing ZLIB compressed version of ISO images. Header is 0x49735a21 / IsZ! at byte 0
DiscJuggler images .cdi
Padus DiscJuggler was a professional mastering solution for CD and DVD. Due to their .cdi image format being highly flexible, it got adopted as the de-facto format for archiving Dreamcast games. There appear to be a few versions. Adding as an extension only, looking at the source for cdi2nero it's a complex format that would need a partial port of that app to understand them, looking at libMirage confirms this idea.
CloneCD Control File .ccd, Image .img and Subchannel Info .sub
CloneCD is another powerful CD/DVD image tool. The .ccd contains various metadata relating to the .img file. Official specs on the format are non-existent it seems, I've inferred the matches from samples from a range of sources. Much like .cdi above some form of decoding may be the way to go in the future, looking at libMirage confirms this idea.
.ccd has 0x5b436c6f6e6543445d / [CloneCD] as it's first line. There are versions on the next line, but as it's a text file spacing\tabs could cause match issues. A regex solution would be best for extracing that info.
.sub files all appear to start with 0xffffffffffffffffffffffff then a few bytes after which may be some sort of versioning.
.img files appear to have a couple of different starting blocks, the two added seem to match a range of test .img files.
BlindWrite images .b5t / .b6t and BlindRead images .bwt
BlindWrite and it predecessor BlindRead are another set of CD/DVD Imaging tools. Much like CloneCd they can produces various files to preserve important onformation about the source disk. Most of these will be extension only for the time being as I lack sample files and cannot find much about the format.
.BWS BlindRead Sub Channel Data
.BWT BlindRead Control File
.BWI BlindRead Image File
.B5T BlindWrite 5 Stream File, libMirage gives a header of 0x425754352053545245414d205349474e / BWT5 STREAM SIGN
.B5I BlindWrite 5 Image File (Tentatively adding header 0xffffffffffffffffffffffff based on source code to b5i2iso)
.B6T BlindWrite 6 Stream File libMirage gives a header of 0x425754352053545245414d205349474e / BWT5 STREAM SIGN
.B6I BlindWrite 6 Image File
WinOnCD images .c2d
While browsing the libMirage source for other formats, this one was in the list. This was an early entry into the CD mastering market, it changed hands a couple of times from Roxio to Adaptec. Two headers 0x4164617074656320436551756164726174205669727475616c43442046696c65 / Adaptec CeQuadrat VirtualCD File and 526f78696f20496d6167652046696c6520466f726d617420332e30 / Roxio Image File Format 3.0
Adaptec Easy CD/DVD Creator image file .cif
Another CD/DVD creator software purchased by Adaptec from Corel, header info from libMirage. This use a RIFF header then at byte 8 0x696d6167 / imag.
There are earlier versions of the format that used .cl2, .cl3 and .cl4 but there is no info on these formats beyond that, will add as extension only until samples files are found.
Alcohol 120% image file .mds and GameJack image file .xmd
Another powerful CD/DVD image creator, like BlindWrite and CloneCD it can make near perfect copies of most discs.
There's not much info on GameJack, it's either a licensed or questionable clone of Alcohol.
.mds and .xmd are the control file 0x4d454449412044455343524950544f5201 / MEDIA DESCRIPTOR
.mdf This is the main image, it's already in the .json
Daemon Tools image file .mdx
Pretty much one of the most popular virtual drive tools, it's been around for a very long time.
.mdx uses 0x4d454449412044455343524950544f5202 / MEDIA DESCRIPTOR which is nearly identical to Alcohol's expect the last byte
Apple Toast File .toast
Toast is a early CD burning software package for Macs, it's changed hands many times of the years.
Early toast files have a header of 45520200 / ER. Later toast files are simply .iso with a different name.
Should close #85
SNDHDR Parity update (and HD/CD/DVD Image files)
.aif/.aiff/.aiffc/.8svx:
These are IFF based files, all start with
0x464f524d
/FORM
AIFF files are a hodgepodge of formats and specs all thrown under the same label, different compression styles or similar compression styles with the wrong FourCC can render a file unplayable on certain software.I've updated/tidied the database to recognise the additional
AIFF
orAIFC
header at byte 8. With possible enhancements under V2 we could perform further matches to detail compression used and possible even bitrates etc....aif
I've removed the trailing00
to allow it access to the multi-part section for better confidences.aiff
@cdgriffith had already addedAIFF
and8SVX
at byte 8 in multi-part, corrected MIME for AIFF["41494646", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"]
,["38535658", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"]
and["41494643", 8, ".aiffc", "audio/x-aifc", "AIFC audio"]
as these are covered by other matches in the0x464f524d
/FORM
match format..au:
The existing fingerprint should match all files No changes, we could extract more info but looking at how
sndhdr
does it I'll leave that for a V2 upgrade.hcom:
There exists almost no information on the format, what there is, is basically the same data as linked below in differing formats. From what I can see it's some old Apple Mac format possibly used in apps and games.
The
sndhdr
test looks for two headers, one is in the Mac header, the other in the Mac data fork. For the time being I have added them as two separate tests, this will give a low-ish confidence score, however, in the absence of test files there is little more I can do.If anyone ever reads this and has some sample files, I'll take a look to improve this match.
.sndt:
After a lot of digging I found this format seems to belong to a very old Win 3.1 era program called SoundTool/SNDTOOL, I managed to source a copy buried in a shareware .iso at archive.org. Downloading it and comparing a sample file included to the ones below seems to indicate this is the source of these files.
.voc/.wav:
No changes required, existing fingerprint will match any VOC/WAV file. V2 Improvements could look to decode audio data for sample rate etc...
.sb/.ub/.ulaw:
Cannot add, .sb and .ub are intended to be signed or unsigned byte-streams as far as I can guess the intentions of the
sndhdr
authors. This means they are simply a stream of bytes that hold audio data, knowledge of the correct bitrate etc.. then decodes them back to audio..ulaw is essentially a CODEC used in various audio containers such as AIFF and AU, this again means there is no specific
ulaw
file format.In these cases, there is not a lot we can do to detect these files. It would basically require creating an audio decoder similar to
sndhdr
or Audacity, VLC etc. to fully process and try to understand these files. This could be possible with V2 but this would take on a life of its own..sndr:
I have no idea on this, I've added the header match from
sndhdr
but again without test files or knowledge of the program they came from we can't go any better than that.Again, if anyone reading this has any test files of the program that made it, I'll take a look and improve.
Other formats:
Honestly this PR is a bit of a so so one, so let's add some extras stuff to make it more exciting.
.vhdx:
The updated version of the older .vhd format used by Microsoft Hyper-V and Virtual PC, nice simple header of
0x7668647866696c65
/vhdxfile
..qcow/.qcow2/.qed:
QEMU's Hard drive image formats. Simple headers with version numbers
0x514649fb00000001
/QFIû
for QCOW Image0x514649fb00000002
/QFIû
for QCOW20x514649fb00000003
/QFIû
for QCOW3 (Still .qcow2 extension)0x514544
/QED
for QEMU Enhanced Disk Image.luks
Linux Unified Key Setup is another HD Image format, there are two versions LUKS1 and LUKS2.
0x4c554b53babe0001
/LUKSº¾
for LUKS10x4c554b53babe0002
/LUKSº¾
for LUKS2 It's an interesting format that has an embedded .json, future V2 functionality could interrogate the files to display encryption type and other data..vdi
Sun/Oracle HD Image for use with VirtualBox. Nice long headers to match against. There is no official document on the format it seems but a good breakdown is available, linked below.
0x3c3c3c2053756e2078564d205669727475616c426f78204469736b20496d616765203e3e3e
/<<< Sun xVM VirtualBox Disk Image >>>
for older Sun images0x3c3c3c204f7261636c6520564d205669727475616c426f78204469736b20496d616765203e3e3e
/<<< Oracle VM VirtualBox Disk Image >>>
for newer Oracle imagesAs far as I can see there is only one version (1.1) with the same image signature starting at byte 64 for both flavours, I've included it as a multi-part for completeness.
.vmdk
There are already entries in the .json for VMWare .vmdk files, I have tidied and adjusted some to better match real world files
["4b444d", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"]
Removed as the correct match is below it in the .json["23204469736b2044657363726970746f", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"]
Corrected to include better match using the full term# Disk DescriptorFile
and changed the name toVMware Image Descriptor File
["23204469736b2044", 0, ".vmdk", "application/octet-stream", "VMware Virtual Disk description"]
Removed as above fix is better match434f5744/COWD
and4b444d56/KDMV
labels, these files do different jobs, they are not for different versions of VMWare.dmg
The venerable archive format of Mac OS machines, the existing entry would only ever work for the file it came from. The correct way to identify a .dmg is to use a footer match at -512 for
koly
.["7801730d626260", 0, ".dmg", "application/octet-stream", "MacOS X image file"]
and["", 0, ".dmg", "application/octet-stream", "MacOS X image file"]
removed, new entry infooter
addedOK, Even more formats
I'll note here the CD/DVD images are a real pain in the backside, lots of overlapping headers and proprietary info. This is a good start for later V2 fun.
MagicISO Image Format .uif
A seemingly much hated proprietary format for storing images of CD/DVD's. Can't find any test files or documentation, however, there is UIF2ISO which converts the files to regular ISO. Digging in the source seems to show a header at byte 0 of
0x73696262
/sibb
with another match at byte 8 of0x72686c62
/rhlb
if it's encrypted.If I ever come across a real file to test against I'll confirm this but the code has been around a long time so it's pretty safe to assume it's correct.
PowerISO Direct Access Archive .daa
Another proprietary format for storing images of CD/DVD's, much like .uif it's also pretty unpopular. The author of UIF2ISO also created a tool to deal with them called DAA2ISO. Simple header of
0x444141
/DAA
at byte 0gBurner Image .gbi
Another proprietary format for storing images of CD/DVD's, it appears to be quite similar to .daa as DAA2ISO handles both. Simple header of
0x474249
/GBI
at byte 0Apple HyperCard Stack .hc
While I was looking for data on another .hc extension, HyperCards popped up, so we'll add them in while we're here. HyperCards were almost a pre-cursor to web pages, able to store text and images in a clickable, searchable database. Header of
0x5354414b
/STAK
at byte 4VeraCrypt File Container .hc
An encrypted image container, we can only add this as an extension as the
VERA
header at byte 64 and all data following is encrypted by the 64 byte salt.Nero Disc images *.nrg
Nero was once one of the most popular CD/DVD burning tools, the .nrg was their own custom image format. These use Footer matches for the two versions
0x4e45524f
/NERO
at -8 and0x4e455235
/NER5
at -12 for v1 and v2 images.Compressed ISO images .isz
Created by EZB Systems for use in their various products, this is an open specification for producing ZLIB compressed version of ISO images. Header is
0x49735a21
/IsZ!
at byte 0DiscJuggler images .cdi
Padus DiscJuggler was a professional mastering solution for CD and DVD. Due to their .cdi image format being highly flexible, it got adopted as the de-facto format for archiving Dreamcast games. There appear to be a few versions. Adding as an extension only, looking at the source for cdi2nero it's a complex format that would need a partial port of that app to understand them, looking at libMirage confirms this idea.
CloneCD Control File .ccd, Image .img and Subchannel Info .sub
CloneCD is another powerful CD/DVD image tool. The .ccd contains various metadata relating to the .img file. Official specs on the format are non-existent it seems, I've inferred the matches from samples from a range of sources. Much like .cdi above some form of decoding may be the way to go in the future, looking at libMirage confirms this idea.
0x5b436c6f6e6543445d
/[CloneCD]
as it's first line. There are versions on the next line, but as it's a text file spacing\tabs could cause match issues. A regex solution would be best for extracing that info.0xffffffffffffffffffffffff
then a few bytes after which may be some sort of versioning.BlindWrite images .b5t / .b6t and BlindRead images .bwt
BlindWrite and it predecessor BlindRead are another set of CD/DVD Imaging tools. Much like CloneCd they can produces various files to preserve important onformation about the source disk. Most of these will be extension only for the time being as I lack sample files and cannot find much about the format.
0x425754352053545245414d205349474e
/BWT5 STREAM SIGN
0xffffffffffffffffffffffff
based on source code to b5i2iso)0x425754352053545245414d205349474e
/BWT5 STREAM SIGN
WinOnCD images .c2d
While browsing the libMirage source for other formats, this one was in the list. This was an early entry into the CD mastering market, it changed hands a couple of times from Roxio to Adaptec. Two headers
0x4164617074656320436551756164726174205669727475616c43442046696c65
/Adaptec CeQuadrat VirtualCD File
and526f78696f20496d6167652046696c6520466f726d617420332e30
/Roxio Image File Format 3.0
Adaptec Easy CD/DVD Creator image file .cif
Another CD/DVD creator software purchased by Adaptec from Corel, header info from libMirage. This use a RIFF header then at byte 8
0x696d6167
/imag
. There are earlier versions of the format that used.cl2
,.cl3
and.cl4
but there is no info on these formats beyond that, will add as extension only until samples files are found.Alcohol 120% image file .mds and GameJack image file .xmd
Another powerful CD/DVD image creator, like BlindWrite and CloneCD it can make near perfect copies of most discs. There's not much info on GameJack, it's either a licensed or questionable clone of Alcohol.
0x4d454449412044455343524950544f5201
/MEDIA DESCRIPTOR
Daemon Tools image file .mdx
Pretty much one of the most popular virtual drive tools, it's been around for a very long time.
0x4d454449412044455343524950544f5202
/MEDIA DESCRIPTOR
which is nearly identical to Alcohol's expect the last byteApple Toast File .toast
Toast is a early CD burning software package for Macs, it's changed hands many times of the years. Early toast files have a header of
45520200
/ER
. Later toast files are simply .iso with a different name.Links: