HiraokaHyperTools / msgreader

35 stars 9 forks source link

.msg containing another (big) .msg as attachment throws an error #20

Closed GTCrais closed 2 years ago

GTCrais commented 2 years ago

I have a FileOne .msg file that contains another FileTwo .msg file as an attachment. FileTwo contains several PDFs and DOCXs as attachments, which makes it about 22MB in size.

When iterating over FileOne's attachments (of which there's only one -- FileTwo .msg), I save that attachment, and then when I try to parse it using your msgreader, it errors out. It does save the attachment "successfully" as a file, but that file is corrupted and cannot be parsed. Microsoft Outlook also cannot open it.

When I open FileOne in MS Outlook, and manually save FileTwo to my desktop, the file is not corrupted, and can also be parsed by msgreader. There is also a difference in size when saved through msgreader vs manually: Through msgreader: 22,433,792 bytes Saved manually: 22,425,600 bytes

The error I'm getting is the following:

RangeError: Offset is outside the bounds of the DataView
    at DataView.getUint32 (<anonymous>)
    at DataStream.readUint32 (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:732:32)
    at MsgReader.fieldsNameIdDir (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:393:53)
    at MsgReader.fieldsDataDirInner (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:199:18)
    at MsgReader.fieldsDataDir (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:344:18)
    at MsgReader.fieldsDataReader (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:421:14)
    at MsgReader.parseMsgData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:429:21)
    at MsgReader.getFileData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:442:36)
    at EmailParser.parse (C:\node-apps\MsgParser\js\services\EmailParser.js:42:23)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Maybe it's related to this suggestion - https://github.com/HiraokaHyperTools/msgreader/issues/3 ?

Unfortunatelly I cannot provide .msg in question because it contains sensitive data. I will try to manually create one that gives me the same error, but I'm not sure I'll be successful.

I was able to create 2 emails that cause DataStream errors, they're available here: https://bit.ly/2ZV79dZ

TestOuterEmail1's attachment .msg causes the error I posted above, while TestOuterEmail2's .msg attachment causes the following error:

RangeError: Invalid typed array length: 512
    at new Uint8Array (<anonymous>)
    at Function.DataStream.memcpy (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:890:21)
    at DataStream.readInt32Array (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\DataStream.js:383:20)
    at Reader.getBlockAt (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\Reader.js:177:24)
    at Reader.xbatDataReader (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\Reader.js:206:34)
    at Reader.parse (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\Reader.js:155:18)
    at MsgReader.parseMsgData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:428:21)
    at MsgReader.getFileData (C:\node-apps\MsgParser\node_modules\@kenjiuno\msgreader\lib\MsgReader.js:442:36)
    at EmailParser.parse (C:\node-apps\MsgParser\js\services\EmailParser.js:42:23)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Steps to reproduce: 1) Parse one of those emails using msgreader 2) Save the .msg attachment 3) Read the saved attachment, and try parsing it using msgreader

It's maybe worth mentioning I'm using version 1.9.0 because, while 1.11.* did fix the original issue I posted (https://github.com/HiraokaHyperTools/msgreader/issues/19) it also broke a bunch of other stuff. I'm going to open a ticket about that tomorrow.

kenjiuno commented 2 years ago

Thanks I'll check this!

kenjiuno commented 2 years ago

Ok 1.12.0-alpha.1 will resolve exporting problem.

GTCrais commented 2 years ago

Great, thanks a lot! I will test this out today and come back with feedback.

GTCrais commented 2 years ago

Update:

Your fixes (mostly) work! There is still one issue, though. Steps to reproduce:

1) Parse TestOuterEmail2.msg (so, the bigger one) 2) Save its one attachment ( Test Attachment Email.msg) 3) While this attachment now can be processed by the msgreader, when trying to open it in MS Outlook, it won't open. The message I get is "Cannot start Microsoft Outlook. Cannot read the item" so there's still an issue with it being saved to disk.

This is NOT the case with TestOuterEmail1 and its attached email. The issue only happens with TestOuterEmail2 and its attached .msg.

kenjiuno commented 2 years ago

Sorry. 1.12.0-alpha.2 will fix the second error: some of CFBF documents are broken when exporting internal msg from TestOuterEmail2.msg.

GTCrais commented 2 years ago

Thanks a lot, great job! Handling attachments works great now!