aboutcode-org / typecode

7 stars 9 forks source link

PDF file detected as non-binary #41

Open stefan6419846 opened 4 months ago

stefan6419846 commented 4 months ago

I am feeding a PDF file to typecode.contenttype.is_binary. As PDF files are usually considered as binary files, I would have expected the file to be detected as binary, but apparently the first bytes used for detection are looking like plain-text, leading to a wrong classification.

Example file: antartica-3427135_640_1_libtiff.pdf