Closed smahoney58 closed 8 years ago
Closing issue. Tested this on Newman v2.1.1 with the Shiavo dataset. No longer seeing the attach_0 and attach_1 attachments. pdf attachments all seem to be shown as pdf. This issue had a lot of parts with the main issue being all the attach_0 and attach_1 files. Will re-open additional issues as new datasets are ingested. If yipsusan dataset is ever re-ingested, I'll revisit this issue to verify that all the issues logged here are fixed.
Step through the following analysis to see all the issues.
Potential Issues:
The corrupted emails are most likely not a Big 5 encoding problem. Possibly embedded tables?
Other issues seen with attachments:
• sales@empowerrf.com has four files with the generic indicator. Two of the four were identical and they were either corrupted, encrypted, or used a different encoding. The other two were pdf files that should have resolved as file type pdf. • Attachment results for search phrase “A small request to Henry” seems to corrupt the Subject for three of the emails in this chain. If you view the last email in the email chain, none of the Subject text is corrupted in the Email view pane. Only in the Email list is Subject corrupted. Second issue is the same as initial attachment analysis in that the attach_0 and attach_1 attachments probably should not be shown as attachments. They contain the contents of the email (attach_0 in Plain text and attach_1 in HTML). I doubt that the user wrote the email and then attached the email to itself.