ArchitecturalKnowledgeAnalysis / EmailDatasetBrowser

Application for interacting with datasets produced by the EmailIndexer.
MIT License
3 stars 1 forks source link

Bug: Black text is invisible #24

Open wmeijer221 opened 2 years ago

wmeijer221 commented 2 years ago

While sieving through my data, I realized that some of the HTML emails contain custom styling. Sometimes this includes the color of the text. I found an email formatted as follows:

<p class=\"MsoNormal\"><span style=\"font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;color:black\">In HA &nbsp;while performing rollback , we use \u201chdfs namenode \ufffdCrollback\u201d which would prompt user for confirmation. &nbsp;( Implemented as part of HDFS-5138)<o:p></o:p></span></p>

Where the color is set to black, which is the same as the background color used in the tool..

andrewlalis commented 2 years ago

Yeah, I know about this, but I don't know how I'd go about doing it in any reasonable way. Ideally, the whole process of "cleaning up" an email's body into a standard , easily readable format would be done by some utility. But it is non-trivial because of the complete randomness with which different tools format body as ascii or proper html or some jumbled mess of css.