digital-preservation / PRONOM_Research

28 stars 10 forks source link

Suggested deprecation of fmt/62 - Microsoft Excel 2000-2003 Workbook (xls) 8X #86

Open Dclipsham opened 2 weeks ago

Dclipsham commented 2 weeks ago

fmt/61 (listed as Microsoft Excel 97 Workbook (xls) 8, alias Microsoft Excel Workbook (97-2000)), and fmt/62 (Microsoft Excel 2000-2003 Workbook (xls) 8X, with aliases Microsoft Excel Workbook (XP-2003), Microsoft Excel Workbook (2002-2003)) have always shared, and continue to share the same binary signature, named 'BIFF 8 & 8X Workbook (generic)'

fmt/61 has a container signature that replicates the pattern of the binary signature, but fmt/62 has no container signature.

Since that generation (roughly MS Office 1997-2003) of XLS is based on OLE2, any complete XLS file instance of that generation should therefore get container ID and return fmt/61.

Therefore, fmt/62 is only a possible identification route if Container ID fails to return a hit and DROID reverts to binary Identification, and the first ~520 bytes are intact.

But that will only happen if the container ID fails to process, which under normal circumstances will only happen if the file itself is corrupt.

So, if anything currently IDs as fmt/62, it's probably broken.

Suggestions:

David