HiraokaHyperTools / msgreader

35 stars 9 forks source link

Some emails cannot be parsed #23

Closed GTCrais closed 1 year ago

GTCrais commented 2 years ago

Error message I get: RangeError: Invalid typed array length: 1577881753

I will try and investigate further.

Update 1:

RangeError: Invalid typed array length: 1577881753
    at new Uint8Array (<anonymous>)
    at Function.DataStream.memcpy (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:889:21)
    at DataStream.readUint16Array (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:446:20)
    at DataStream.readUCS2String (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:1043:54)
    at MsgReader.fieldsNameIdDir (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:390:44)
    at MsgReader.fieldsDataDirInner (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:198:18)
    at MsgReader.fieldsDataDir (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:338:18)
    at MsgReader.fieldsDataReader (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:415:14)
    at MsgReader.parseMsgData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:423:21)
    at MsgReader.getFileData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:436:36)

Update 2: It is possible this is an issue with the msg file I'm trying to feed into the reader. Investigating further.

Update 3: I was able to reproduce this. Steps: 1) I have a working, non-broken .msg file EmailOne.msg 2) Open it in MS Outlook, it works fine 3) MS Outlook gives me the following option: Click here to download pictures. To help protect your privacy, Outlook prevented automatic download of some pictures in this message 4) After clicking on that, MS Outlook downloads these pictures and the filesize of EmailOne.msg goes up by about 0.04MB 5) The file can no longer be processed by msgreader and gives the error from Update 1

I will try and figure out which version of the msgreader broke this, but I assume it was 1.13.0-alpha.1

Update 4: This problem exists all the way back in 1.9.0 as well. It is by pure coincidence that I've discovered this. It looks like it's completely unrelated to any updates, fixes and refactoring since then.

Update 5: I have found a workaround for this issue -- after opening the "broken" file in MS Outlook, I re-save it back to disk. The re-saved file can be processed by msgreader.

kenjiuno commented 2 years ago
at DataStream.readUCS2String (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:1043:54)
at MsgReader.fieldsNameIdDir (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:390:44)

Although currently I don't have good clue to work on my side, this error seems to occur when reading __nameid_version1.0\__substg1.0_00040102 file.

2021-11-12_13h00_04

__substg1.0_00040102 stores string table of: [MS-OXMSG]: String Named Property | Microsoft Docs

msg file can store "properties". The key of each "property" maybe one of: propertyTag, propertySet+Lid, propertyName:

propertyTag

      "rawProps": [
        {
          "propertyTag": "3001001f",
          "value": "ku@digitaldolphins.jp"
        },
        ...
      ]

propertySet+Lid

    {
      "propertyTag": "80110040",
      "propertySet": "00062008-0000-0000-c000-000000000046",
      "propertyLid": "000085bf",
      "value": "Tue, 31 Aug 2021 01:55:39 GMT"
    },

propertyName

    {
      "propertyTag": "8013001f",
      "propertyName": "ClientInfo",
      "value": "Client=MSExchangeRPC"
    },

"propertyName": "ClientInfo", or such is stored inside __substg1.0_00040102.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.