elastic / elasticsearch-mapper-attachments

Mapper Attachments Type plugin for Elasticsearch
https://www.elastic.co
Apache License 2.0
503 stars 94 forks source link

Parsing of application/ms-tnef type is incorrect #52

Closed benjamin-kett closed 10 years ago

benjamin-kett commented 10 years ago

When indexing a TNEF document, The contents of the winmail.dat attachment are not searchable, even though Tika 1.4 has a TnefParser and (I think) is being run on the document. The index returns a content type of message/rfc822 when indexing an email containing a winmail.dat attachment, or text/plain; charset=windows-1252 when indexing the raw attachment.

Both of these types are incorrect, the content type is application/ms-tnef, and the attachment content remains in base64 and isn't searchable. I have tested this on plugin version 1.9.0 with es 0.9.11

dadoonet commented 10 years ago

Could you share a TNEF document I can use for testing? Can I reuse that document in tests? I would like to check if Tika 1.5 solved it.

If you agree, could you sign the CLA? http://www.elasticsearch.org/contributor-agreement/

dadoonet commented 10 years ago

No news on this. Closing. Feel free to reopen with any new information.