apache / incubator-ponymail

Apache Pony Mail (Incubating) - Email for Ponies & People
http://ponymail.incubator.apache.org/
Other
80 stars 30 forks source link

wrong encoding in attachment breaks import #488

Closed scharc closed 4 years ago

scharc commented 5 years ago

when importing mbox file with attachments it breaks with the following error:

Found attachment: Kurierplan_17.03.19.xlsx
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "import-mbox.py", line 223, in run
    json, contents, _msgdata, _irt = archie.compute_updates(list_override, private, message)
  File "/var/www/ponymail/tools/archiver.py", line 342, in compute_updates
    attachments, contents = self.msgfiles(msg)
  File "/var/www/ponymail/tools/archiver.py", line 226, in msgfiles
    part_meta, part_file = parse_attachment(part)
  File "/var/www/ponymail/tools/archiver.py", line 102, in parse_attachment
    print("Found attachment: %s" % filename)
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 24: ordinal not in range(128)
sebbASF commented 5 years ago

Which version of PonyMail? Can you provide a simple test case?

scharc commented 5 years ago

We are testing ponymail 0.11 with Debian Jessi in a proxmox lxc container.

I can not provide a test case to the public. For debugging I could give a sample to a developer. Please contact me at scharc (at) gmail (dot) com

sebbASF commented 5 years ago

I have just noticed that the line numbers don't quite agree.

The stack trace says:

File "/var/www/ponymail/tools/archiver.py", line 102, in parse_attachment print("Found attachment: %s" % filename)

Howver the print statement is at line 101 in the code: https://github.com/apache/incubator-ponymail/blob/8b00e7c8eabf01a68fc119d2ce58bfbfc3c3eea3/tools/archiver.py#L101

This is a bit odd if you are using version 0.11 (or indeed the trunk version, as that is the same). It would be worth commenting out the print statement to see if that avoids the failure or if it fails elsewhere.

Note: I have been sent sample data, but cannot get it to fail, so it is looking like an issue with your installation.

scharc commented 5 years ago

The wrong line numbers was my fault. I wrote a comment there, to remeber that it broke there.

After commenting out the print statement the import went through!

sebbASF commented 5 years ago

I have not been able to cause the error on my system, and AFAIK no-one else has seen the same issue.

Since it only affects the print statement, I wonder if it could be caused by the terminal encoding setting?

Try running the following code using python3:

print("\xfc")

It should print a lower-case u with umlaut.

sebbASF commented 5 years ago

Could also try replacing the print statement with the following:

print("Found attachment: %s" % filename.encode('ascii','xmlcharrefreplace').decode('ascii'))

This should allow the name to be printed on an ASCII terminal. Non-ASCII characters such as u-umlaut will be converted to the form: 'ü'

sebbASF commented 4 years ago

Not a PonyMail bug