Open Leftium opened 11 months ago
update: This issue isn't limited to non-UTF8 files.
Some UTF8 encoded files also throw this exception. For example, if the From
header has emoji:
From:🔥Keto_Rapid_Diet🔥 <xafnsbqsmgniwdztev@twhzbt.drivefact.org>
There were also more emails from the the Korean address (From: "(주)한웰이쇼핑" <help@daisomall.co.kr>
) that failed to restore even after converting the .eml file to UTF8 and ensuring there were no mangled characters.
The best work-around seems to be to rename these .eml files so gyb skips them.
I modified my gyb.py to catch these exceptions, printing the problem message info and continuing with the remaining messages:
if options.cleanup:
try:
full_message = message_hygiene(full_message)
except TypeError as error:
print(
f'WARNING! error cleaning message {message_num} ({message_filename})')
print(f' {error}')
print(f' this message will be skipped.')
continue
Compare to original code.
Got the fix on StackOverflow: policy=email.policy.SMTPUTF8
I confirmed Korean was restored without mangling, but the emoji ended up being mangled. Perhaps because the emoji from name not wrapped in quotes? Not a big deal since emoji was from a spam email.
def message_hygiene(msg):
'''Ensure Message-Id, Date and From headers are valid. Replace if not.'''
omsg = email.message_from_bytes(msg, policy=email.policy.SMTPUTF8)
orig_id = omsg['message-id']
orig_date = omsg['date']
orig_from = omsg['from']
Full steps to reproduce the issue:
From:
header.--cleanup
.Expected outcome: GYB gracefully handles unicode/emoji in headers, either:
Actual outcome: GYB exits with unhandled exception:
Work-around:
Suggested alternative fix: always convert non UTF8 files to UTF8 when saving backup.
Notes:
--cleanup
is not used. (Did not confirm if text was mangled after restore.)gyb --action backup
.Mangled text:
Proper text: