PDF files detected as plain/text

TonyValenti / Mime-Detective-clarkis117

Mime type detector for files, byte arrays, and streams, .NET Standard Fork

MIT License

38 stars 9 forks source link

PDF files detected as plain/text #8

Closed superjulius closed 6 years ago

superjulius commented 6 years ago

I faced the same issue #12 as reported in the original repository.

The proposed change was to run the file signature detection first, and then plain text detection. That would definitely fix the issue and be more reliable.

Would you consider making this change?

If so, it might then be possible to improve the plain text detection to detect the file encoding.

clarkis117 commented 6 years ago

@superjulius do you have a sample pdf that replicates this? I wasn't able to replicate it using a pdf generated by Microsoft Print to PDF on Windows 10. Improving handling of plan text files, is on my todo list.

superjulius commented 6 years ago

@clarkis117 I unfortunately cannot share the PDF with which I faced the issue but you can definitely used the one referenced (http://www.orimi.com/pdf-test.pdf) in the issue mentioned above.

clarkis117 commented 6 years ago

@superjulius I was not able to replicate this behavior on my dev branch for 0.6.0; therefore, it seems fixed. Just in case, I'm adding more data tests around pdfs.

clarkis117 commented 6 years ago

@superjulius PDF detection should be working properly in this build, please see if you can repro your issue https://ci.appveyor.com/project/clarkis117/mime-detective/build/artifacts

Also up on nuget now

superjulius commented 6 years ago

@clarkis117 I have been out for couple of weeks and I just tried the file detection which worked great on both the pdf referenced in the issue (i.e. pdf-test.pdf) and the PDF that I could not share. I think that you could close this issue. Thanks for your work!