bodyHTML always empty - Githubissues

epfromer / pst-extractor

Extract objects from MS Outlook/Exchange PST files

59 stars 18 forks source link

bodyHTML always empty #18

Closed matthiasg closed 2 years ago

matthiasg commented 3 years ago

First: This is a great library ! Thanks!

Working with it I see that bodyHTML is always empty in my datasets. Is this normal ? It is a little cumbersome to convert RTF to html all the time.

Side Question: Is there a way to extract the original email ascii representation maybe ? Then I could parse it myself.

epfromer commented 3 years ago

I think that's normal. I too have noticed that bodyHTML is always empty. There is a tool called Outlook Spy at https://www.dimastr.com/outspy/home.htm that I will try on some PSTs to see if pst-extractor isn't properly extracting the field. If you do same and find a problem, please notify me.

EliteCheng commented 3 years ago

Maybe I found the problem. Some encoding is not handled in the createJavascriptString method. It can be handled with iconv. @epfromer @shalomscott @matthiasg

  public static createJavascriptString(
    data: Buffer,
    stringType: number,
    codepage = 'utf8',
  ): string {
    // TODO - codepage is not used...
    try {
      if (stringType == 0x1f) {
        // convert and trim any nulls
        return data.toString('utf16le').replace(/\0/g, '')
      } else {
         return iconv.decode(data, codepage).toString()
      }
    } catch (err) {
      console.error(
        'PSTUtil::createJavascriptString Unable to decode string\n' + err
      )
      throw err
    }
    return ''
  }

wendallsan commented 2 years ago

@EliteCheng I was able to implement your suggested fix in a couple minutes and it seems to work great! As a total node.js noob, I added "iconv-lite" as a dependency in my package.json file, added const iconv = require( 'iconv-lite' ); at the top of the PSTUtil.class.js file, and your iconv.decode call into place in the createJavascriptString method and am now seeing HTML content in the bodyHTML property rather than an empty string. Great job!

epfromer commented 2 years ago

@EliteCheng thanks for this suggestion, I'm incorporating it into the new release