eneam / mboxviewer

A small but powerfull app for viewing MBOX files
Other
432 stars 24 forks source link

Gmail label are garbled characters #39

Closed accept closed 1 year ago

accept commented 1 year ago

Nice to meet you. First of all thank you for the great program. I also use DeepL translation because I am Japanese and not fluent in English. I apologize if my meaning is not understood or if I am rude.

In my environment, when I load the mbox exported from Google, the labels in the Japanese part are garbled as shown in the red circled area in the image.

I don't see any garbled characters in the body of the email, the title, and other parts so far. The English part is displayed correctly. It is probably a problem with the character encoding.

Is there any way to fix this by changing the character encoding in the settings? Or is it only possible for the author to fix the program?

zigm commented 1 year ago

Hi,

Unfortunately I don't think there is an easy way to resolve the issue you reported. The main problem is that the Mbox Viewer code needs to be ported to UNICODE to resolve the issue you reported and few other similar issues. Unfortunately it was not done day one when program was very small and now it requires larger effort to port to UNICODE.

As you indicated the email body is encoded correctly, mail summary should be shown correctly as well due some custom text drawing but the text under the Mail Tree including Labels may not be correct. The plan is to eventually port MBox Viewer to UNICODE but the work was always delayed by work on new requirements.

I am sorry I don't have good news for you but thank you for raising the issue. It will help to prioritize the port to UNICODE.

There seem to be multiple text encoding formats for Japanese. Maybe it might be possible to provide support for 8-bit version of Japanese encoding. Would you be able to provide few email files for me to investigate? MBox Viewer allows you to export mails as the eml or mbox text files and attach them to this ticket. Obviously, I am asking for mail files without confidential content.

You can export mails as eml files as follow:

1) Select "File->Options->Export EML->yes->OK" 2) Select email to export and double left click on the select email. Folder will open with exported email as mime-message.eml 3) Attach the eml file to your response/comment.

I will examine the content and figured out if resolution can be found for standard? encoding of Japanese language.

You can also create mbox archive file from multiple selected emails as follow:

1) Select multiple emails by using CTLR key (standard Microsoft way) 2) Right click on one of the select emails and select "Copy Selected into User Selected Mails" 3) Select "User Selected Mails" list (right to All Mails) 4) Right click on any email and select "Save All as MBox Mail Archive File" 5) Select "Open File Location" 5.. Attach created mbox file to the ticket/Comment. You may need to append .txt extension to the file name.

Ultimate solution is still port of MBox Viewer to UNICODE.

Thank You,

accept commented 1 year ago

@zigm Hi

Thanks for getting back to me. I see that it is still a Unicode-related problem. I understand very well the difficulty of making a program Unicode compliant. From the README, I learned that the software has existed at least since 2005. I think it is inevitable that it is not Unicode compliant. I am rather impressed that you have been maintaining the software steadily for so long. We are patiently waiting for Unicode support.

I am very happy to hear your proposal for Japanese language support. As you say, the Japanese character code other than Unicode is mainly Shift_JIS. The international standard seems to be called ANSI, and Windows Notepad actually describes it as ANSI, but the clear difference between the two is unclear...

I understand about the provision of the email file. I will prepare the data with Japanese Gmail character labels similar to those in the screenshot, but without any private information. Please give me some time.

accept commented 1 year ago

Sorry for the delay. We have prepared Japanese Mbox data exported from Gmail that does not contain personal information and causes garbled characters.

We have confirmed that the characters are garbled as shown in the image even in this environment.

We hope this will be useful.

zigm commented 1 year ago

Thanks for the sample mbox file. X-Gmail-Labels: are encoded in UTF-8 which current MBox Viewer is not able to support in general case unless it translates to standard 8 bit Japanese. Work is needed even if Labels translate to 8 Bit Japanese. I will instigate what can be done without porting to UNICODE and provide an update.

From 1751708157032312051@xxx Fri Dec 09 04:11:19 +0000 2022 X-GM-THRID: 1751708063162424667 X-Gmail-Labels: =?UTF-8?B?44Ki44O844Kr44Kk44OW5riI44G/LOmAgeS/oea4iOOBv+ODoeODvOODqw==?= =?UTF-8?B?LOmHjeimgSzplovlsIHmuIjjgb8s44OG44K544OI44OH44O844K/?= MIME-Version: 1.0 Date: Fri, 9 Dec 2022 13:11:19 +0900 Message-ID: CAGER_h9BJEw6say1wR=5tmPdtC+vmWYjFBVYvQr27JOa1ydTCg@mail.gmail.com Subject: =?UTF-8?B?44OG44K544OI44OH44O844K/44Gn44GZ44CC?= From: =?UTF-8?B?44OG44K544OI44Om44O844K244O8?= rensyu.user@gmail.com To: =?UTF-8?B?44Om44O844K244O844OG44K544OI?= rensyu.user@gmail.com Content-Type: multipart/alternative; boundary="000000000000813a1b05ef5d58fc"

accept commented 1 year ago

@zigm Thank you very much. We look forward to seeing your progress.

zigm commented 1 year ago

I have released v1.0.3.37 of MBox Viewer to address garbled label text encoded in ANSI. I was not able to do proper test with shift-jis since I would have to reconfigure my laptop to set shift_jis as my local ANSI character set. However, I was able to verify that the labels in your sample eml files can be encoded in shift_jis. As long as labels in all emails can be re-encoded in shift_jis, I am hopeful that this release will display label's text properly.

Appreciate if you can test the v1.0.3.37 release and provide feedback.

zigm commented 1 year ago

Forgot to ask you to run "File->Development Options->About System" menu option to see the Code Page/character set is set for use by the non UNICODE applications.

accept commented 1 year ago

@zigm Thank you very much. I have verified that the Gmail label appears correctly. Thank you so much for this wonderful update. I hope that 2023 will be great for zigm and all those involved in the development of mbox viewer. Thank you for your continued support.

Sincerely yours, accept