TonyValenti / Mime-Detective-clarkis117

Mime type detector for files, byte arrays, and streams, .NET Standard Fork
MIT License
38 stars 9 forks source link

Word / Excel 97-2003 files may be detected as MSDOC #9

Closed superjulius closed 6 years ago

superjulius commented 6 years ago

Word / Excel 97-2003 files may be detected as MSDOC type and not WORD or EXCEL. I faced the issue by creating a blank Excel file in Excel 2007 and save it as XLS (check blank.zip)

From what I could check, Office 97-2003 file signatures are based on "subheaders" and there might have several of them without a clear documentation. However the library would detect it as MSDOC type.

I would therefore suggest to

Something like // OLECF - Object Linking and Embedding (OLE) Compound File (CF) // Compound Binary File format by Microsoft, used by Microsoft Office 97-2003 applications(Word, Powerpoint, Excel, Wizard) public readonly static FileType MS_OFFICE = new FileType(new byte?[] { 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1 }, "doc,ppt,xls", "application/octet-stream");

Since the type appears after WORD and EXCEL types, the detection would first match based on subheaders and default to this one if the subheader does not match.

clarkis117 commented 6 years ago

@superjulius I'm trying to find some older versions of Office to generate test data for this.