Closed matthiasg closed 2 years ago
I think that's normal. I too have noticed that bodyHTML is always empty. There is a tool called Outlook Spy at https://www.dimastr.com/outspy/home.htm that I will try on some PSTs to see if pst-extractor isn't properly extracting the field. If you do same and find a problem, please notify me.
Maybe I found the problem. Some encoding is not handled in the createJavascriptString method. It can be handled with iconv. @epfromer @shalomscott @matthiasg
public static createJavascriptString(
data: Buffer,
stringType: number,
codepage = 'utf8',
): string {
// TODO - codepage is not used...
try {
if (stringType == 0x1f) {
// convert and trim any nulls
return data.toString('utf16le').replace(/\0/g, '')
} else {
return iconv.decode(data, codepage).toString()
}
} catch (err) {
console.error(
'PSTUtil::createJavascriptString Unable to decode string\n' + err
)
throw err
}
return ''
}
@EliteCheng I was able to implement your suggested fix in a couple minutes and it seems to work great! As a total node.js noob, I added "iconv-lite" as a dependency in my package.json file, added const iconv = require( 'iconv-lite' );
at the top of the PSTUtil.class.js file, and your iconv.decode call into place in the createJavascriptString method and am now seeing HTML content in the bodyHTML property rather than an empty string. Great job!
@EliteCheng thanks for this suggestion, I'm incorporating it into the new release
First: This is a great library ! Thanks!
Working with it I see that bodyHTML is always empty in my datasets. Is this normal ? It is a little cumbersome to convert RTF to html all the time.
Side Question: Is there a way to extract the original email ascii representation maybe ? Then I could parse it myself.