Closed csb0730 closed 6 years ago
might be related to https://github.com/deltachat/deltachat-core/issues/98
Yes, in some way. #98 describes encoding, this issue here describes decoding of filenames (text). RFC2047 seems to be the related issue.
Hi @r10s, if You tell me where the best code position is to start an investigation I'll investigate in that. I think we need here simply a decoding functionality for RFC2047 encodings and that's it.
Are we again in mrmimeparser.c ? do_add_single_file_part() ?
yes, around there. i would start around https://github.com/deltachat/deltachat-core/blob/master/src/mrmimeparser.c#L1137 and check the different sources of desired_filename - i've just added some comments about which headers are parsed.
the filename itself comes from libEtPan, probably with different encodings for the first and the second source.
regarding the encoding, Wikipedia says RFC 2231 - https://en.wikipedia.org/wiki/MIME#Content-Disposition - so, this should be double-checked :)
As far as I see RFC2231 is not related to the issue here. An examination of the .eml file shows that simply the filename of the attachment is not decoded. It's build as an encoded word construct related RFC2047.
Here an excerpt from the example email source:
Subject: =?iso-8859-1?Q?xxxxxx xxxx =D6sterreich xxx =22xxxxxxxxxx=22 xxx?=
==> is displayed correctly as
xxxxxx xxxx Österreich xxx "xxxxxxxxxx" xxx
Now the attachment:
------=_NextPart_000_0044_01D3EBBB.EBD4A630`
Content-Type: application/pdf;
name="=?iso-8859-1?Q?xxxxxxxxxxxxxxxxxx.xxx-xxxxxx_xxxx_=D6sterreich_xxxxxxxxx-?=
=?iso-8859-1?Q?xxxx_xxx.pdf?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="=?iso-8859-1?Q?xxxxxxxxxxxxxxxxxx.xxx-xxxxxx_xxxx_=D6sterreich_xxxxxxxxx-?=
=?iso-8859-1?Q?xxxx_xxx.pdf?="
==> here no decoding of filename, full encoded line is used as filename
=?iso-8859-1?Q?xxxxxx ...
correct filname would be (is):
xxxxxxxxxxxxxxxxxx.xxx-xxxxxx xxxx Österreich xxx xxxxx-xxxx xxx.pdf
I think before or after https://github.com/deltachat/deltachat-core/blob/master/src/mrmimeparser.c#L1181 is the correct position to decode filename if required:
1181: mr_replace_bad_utf8_chars(desired_filename);
1183: do_add_single_file_part(ths, msg_type, mime_type, decoded_data, decoded_data_bytes, desired_filename);
But because the Subject: is decoded properly: There should exist a function somewhere which can be used to decode the filename? Simply use it here? :)
What about _mr_decode_headerstring() in mrtools.c ? It seems to do that :-)
@r10s: Did You see last comments?
@csb0730 yes, mr_decode_header_string() does this decoding.
playing around a bit: attaching a file with the name testäöü.txt
in thunderbird gets encoded as
Content-Type: text/plain; charset=UTF-8;
name="=?UTF-8?B?dGVzdMOkw7bDvC50eHQ=?="
and the name is decoded correctly in Delta Chat. decoding is done here: https://github.com/deltachat/deltachat-core/blob/master/src/mrmimeparser.c#L1167 and is fine.
wondering which app you have used at https://github.com/deltachat/deltachat-core/issues/162#issuecomment-388990448
okay, as you mentioned, when filetype came from Content-Disposition ... filename=
the name is not decoded. added this.
Hi @r10s,
Above You referenced always to the "B" encoding. This seems to work. But the "Q" encoding is obviously not working!
So I recommend to reopen this issue.
See my comment 14 days ago with reference to mr_decode_header_string()
I think this could be the right way, isn't it?
I think I missed obviously the essential part in dd1b4fc ! See my last comments as an additional explanation but I think this issue is really closed now
;-)
I think this issue is really closed now
Great :)
Some attachments use special encoding of filename. See picture. This encoding is described in RFC but here it is not properly decoded and bad filenames are generated to store. As a result these attachments are not accessable later.