Open MrSwed opened 1 year ago
I see the same with an .xls file. Seems to be a common issue also in other mime detection libraries like ruby
@eqinox76 I'm looking into how
file --mime 2.ppt
does detection in order to make it work in go. If there's no privacy concern, please upload your .xls so I can use it for tests.
also check the 'file' version, those may give different results :(
file -v; file --mime-type -b -E 2.ppt
file-5.41
magic file from /etc/magic:/usr/share/misc/magic
application/vnd.ms-powerpoint
Sadly i cannot share the file. When i remove proprietary information and save it the resulting file is correctly recognised as "application/vnd.ms-excel".
file in version 5.41 shows the correct mime type as well
file --mime 1.xls
1.xls: application/vnd.ms-excel; charset=binary
I tried to debug this issue a bit more and see in matchOleClsid
(ms_office.go:224, github.com/gabriel-vasile/mimetype v1.4.2) the following state:
- in[26:28]
[]uint8 len: 2, cap: 2, [3,0]
- clsidOffset
1616
- firstSecID
2
- in[clsidOffset:]
[]uint8 len: 1456, cap: 1456, [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,112,224,41,135,124,141,217,1,254,255,255,255,0,0,0,0,0,0,0,0,87,0,111,0,114,0,107,0,98,0,111,0,111,0,107,0,...+1392 more]
I hope this information helps a bit. Many thanks for looking into it!
@eqinox76 I have a suspicion that your problem happens because the excel signature is at the end of the file. Try disabling the limit for the amount of bytes used for detection with:
mimetype.SetLimit(0) // Default limit is 3072. Setting the limit to 0 will make mimetype use whole file.
mtype, err := mimetype.DetectFile("your_file.xls")
More details are in the FAQ
Thanks for the tip. Sadly this file seems to work somehow different.
When i set the limit to unlimited neither the subheaders nor the magic bytes at the end of func Xls
are found and the library returns application/x-ole-storage
.
i can not figure out which rule in the file
command recognises this .xls file. the most output i can get is:
file -d 1.xls
[try zmagic 0]
[try tar 0]
[try json 0]
[try csv 0]
[try cdf 1]
1.xls: CDFV2 Microsoft Excel
Let me know if you have another idea what information i could share without sharing the whole file.
@MrSwed your file is not detected because the signature is at the end of the file. Use SetLimit
as explained in FAQ and it will be detected correctly.
@eqinox76 Your case seems more complicated. mimetype
uses CLSIDs for ole files detection.
It would be helpful to know the CLSID of that Xls file. This program will output the CLSID and the offset where it can be found.
package main
import (
"encoding/binary"
"encoding/hex"
"fmt"
"os"
)
func main() {
d, err := os.ReadFile("1.xls")
if err != nil {
panic(err)
}
fmt.Println(getOleClsid(d))
}
func getOleClsid(in []byte) (int, string) {
sectorLength := 512
if in[26] == 0x04 && in[27] == 0x00 {
sectorLength = 4096
}
// SecID of first sector of the directory stream.
firstSecID := int(binary.LittleEndian.Uint32(in[48:52]))
// Expected offset of CLSID for root storage object.
clsidOffset := sectorLength*(1+firstSecID) + 80
return clsidOffset, hex.EncodeToString(in[clsidOffset : clsidOffset+16])
}
Glad to do so! The output is
1616 00000000000000000000000000000000
Some more information that might be handy:
in[26] = 3
firstSecID = 2
Is this issue fixed now?
I have the same issue and additional issues I am checking by reading the whole file with SetLimit(0)
Checking ppt extension (file: test.ppt) Expected value: application/vnd.ms-powerpoint Recognized value: application/x-ole-storage
Additional issues (file: test.doc) Checking doc extension (test.doc) Expected value: application/msword Recognized value: application/vnd.ms-powerpoint
Version : github.com/gabriel-vasile/mimetype v1.4.5
The file for which the detection is inaccurate 2.zip
Expected MIME type application/vnd.ms-powerpoint
Returned MIME type application/x-ole-storage
Version of the library you are using v1.4.1
Output of
go version
go version go1.16.15 linux/amd64Additional context https://www.htmlstrip.com/mime-file-type-checker and console command
file --mime-type 2.ppt
give correct results