HiraokaHyperTools / msgreader

35 stars 9 forks source link

FieldsData body produces null character? #34

Closed PixelsByLucas closed 1 year ago

PixelsByLucas commented 1 year ago

Is it possible that FieldsData body can produce a null character like "\u0000" instead of the actual email body?

I noticed this in my logs from a PostgreSQL exception: invalid byte sequence for encoding "UTF8": 0x00. This is caused by trying to insert a null character into a text field. The value I was trying to insert was coming from FieldsData body. I noticed that the same email will sometimes produce this result, while other times the body is parsed fine and everything works as expected. Also when this issue happens, other data like messageDeliveryTime & recipients is parsed correctly.

Any idea what could be causing this?

kenjiuno commented 1 year ago

Do you mention body about this body? https://hiraokahypertools.github.io/msgreader/typedoc/interfaces/MsgReader.FieldsData.html#body

Including null character in body can be possible in considerable reasons.

Including null character in body might be edge case.

msgreader introduces value comparison tests. Compare JSON files that are made by property key-values extracted from msg file. https://github.com/HiraokaHyperTools/msgreader/tree/master/test Currently there is no known case of including null character in body .

If you have enough time to check this case, you can try another msg parser library like msg-parser - npm

If you want to browse direct content of msg file, 7-Zip File Manager for Windows is the best tool to explore.

2023-01-13_13h58_53

Select __substg1.0_1000001F and type F3 key to launch notepad or such.

2023-01-13_13h58_58

Reading the __substg1.0_1000001F file with binary editor may help to ident problems.

2023-01-13_14h03_38

Property type of __substg1.0_1000001F is PtypString. It is a string of Unicode characters in UTF-16LE format encoding. [MS-OXCDATA]: Property Data Types | Microsoft Learn

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.