Open sudipm-mukherjee opened 4 years ago
HI @sudipm-mukherjee
after checking offlineimap and this issue, IMO the problem is not related to offlineimap. I will try to explain the problem.
offlineimap is using the mailfolders as byte objects. Then, the object is sent to the lambda function used in nametrans
:
nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')
This call is more or less:
def nametrans(byte_object_in_imap4-utf-7):
string_in_imap4-utf-7 = byte_object_in_imap4-utf-7.decode('imap4-utf-7')
byte_object_in_utf8 = string_in_imap4-utf-7.encode('utf-8')
return byte_object_in_utf8
So the function receives a byte object and returns a byte object. It uses an string to change one format to other.
offlineimap is using the mailfolders as (utf-8) string objects. Then, the object is sent to the lambda function used in nametrans
. This function is waiting for a bytes object and then it fails in the byte_object_in_imap4-utf-7.decode('imap4-utf-7')
and we get the error:
ERROR: While attempting to sync account 'testmail'
'str' object has no attribute 'decode'
IMO, the problem is not offlineimap, and then, we should not change offlineimap for it. In Python 3, all strings objects are already UTF-8, so we don't need convert them. If we need change the string format, we need change it in a different way in the attached offlineimap.py script.
The new script should convert an (utf-8) "string imap_utf7" to (utf-8) "string utf-8".
offlineimap is doing this conversion in some places. For example, check the function "decode_mailbox_name" and other functions like "IMAP_utf8" and "utf8_IMAP" (in the same file, next functions):
https://github.com/OfflineIMAP/offlineimap3/blob/master/offlineimap/imaputil.py#L321
I saw these functions when I migrated offlineimap to Python 3. Probably these functions are not perfect. I check them and check the functions used in IMAPClient (https://github.com/mjs/imapclient), and probably the functions in imapclient could be better. But this is other history.
Probably you can check this code: https://github.com/MarechJ/py3_imap_utf7 to use as lambda in the offlineimap.py file. Seems to be better than imapclient. I didn't test it, but probably you can use it as (including it as pythonfile
; do not forget include the import subprocess
if you are using the password in remotepasseval
):
nametrans = lambda foldername: decode(encode(foldername))
Please, try this new option. If the new options is working and you agree with my explanation, feel free to close this bug.
Best Regards, Rodolfo (kix)
I'm the downstream bug reporter that @sudipm-mukherjee has been helping. Using your link I created the following pythonfile:
import binascii
# import codecs
# from base64 import b64encode, b64decode
# encoding
def modified_base64(s):
s = s.encode('utf-16be')
return binascii.b2a_base64(s).rstrip(b'\n=').replace(b'/', b',')
# return b64encode(s, b"+,")#.rstrip('\n=')
def doB64(_in, r):
if _in:
r.append(b'&' + modified_base64(''.join(_in)) + b'-')
del _in[:]
def encode(s: str):
r = []
_in = []
for c in s:
ordC = ord(c)
if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
doB64(_in, r)
r.append(c.encode())
elif c == '&':
doB64(_in, r)
r.append(b'&-')
else:
_in.append(c)
doB64(_in, r)
return b''.join(r)
# decoding
def modified_unbase64(s):
# b = b64decode(s, b"+,")
b = binascii.a2b_base64(s.replace(b',', b'/') + b'===')
return b.decode('utf-16be')
def decode(s: bytes):
r = []
decode = bytearray()
for c in s:
if c == ord('&') and not decode:
decode.append(ord('&'))
elif c == ord('-') and decode:
if len(decode) == 1:
r.append('&')
else:
ab = modified_unbase64(decode[1:])
r.append(ab)
decode = bytearray()
elif decode:
decode.append(c)
else:
r.append(chr(c))
if decode:
r.append(modified_unbase64(decode[1:]))
bin_str = ''.join(r)
return bin_str
# required for reading passwords
import subprocess
and used the nametrans lambda you suggested above:
nametrans = lambda foldername: decode(encode(foldername))
Clearly, however, my python-fu is not strong enough. This configuration resulted in the same error @sudipm-mukherjee reported at the top of this issue. Playing with the function names in the pythonfile shows that the "attribute 'decode'" in the error message refers to the function "decode" in the pythonfile -- if you change the name of the "decode" function in the pythonfile and nametrans call, the changed name appears in the error message.
Hi @dnebauer!
Thanks for your comment. Could you send me the foldername with problems?
Regards, kix
I'm not sure I understand your question. The error I reported occurs with every account I have defined, if I use the pythonfile and nametrans lambda shown above. The defined accounts are 'austdomains' and 'bigpond'. The 'austdomains' account has the mailboxes 'INBOX', 'INBOX.Archive', 'INBOX.Drafts', 'INBOX.Junk', 'INBOX.Sent', 'INBOX.spam', and 'INBOX.Trash'. The 'bigpond' account has the mailboxes 'Deleted', 'Drafts', 'INBOX', 'Junk', 'Sent', and 'Trash'. Let me know if you were after something else.
I am unable to reproduce this error with the file that @dnebauer sent me. This is how I am testing:
1) Delete all the mail folders from laptop.
2) create new label "abc®D" in my test gmail account.
3) use the offlineimap.py that @dnebauer sent me
4) Use nametrans = lambda foldername: decode(encode(foldername))
in offlineimaprc
5) sync using offlineimap -o -u quiet
I dont get any error, and after syncing I do see a local folder named abc&AK4-D
.
Without the nametrans
and all the same steps as above I get a folder with name 'abc&AK4-D'
. Only difference between these two are the '
Hi,
the problem is the ampersand character. In utf-7 the character should be a plus (+), so the folder name should be abc+AK4-D
instead of abc&AK4-D
. Then, the folder name is converted to abc@D
I did a modified version of the conversion script:
import subprocess # required for reading passwords
import re
def kix_utf7_to_utf8(str_imap):
"""
This function converts an IMAP_UTF-7 string object to UTF-8.
It first replaces the ampersand (&) character with plus character (+)
in the cases of UTF-7 character and then decode the string to utf-8.
If the str_imap string is already UTF-8, return it.
For example, "abc&AK4-D" is translated to "abc+AK4-D"
and then, to "abc@D"
Example code:
my_string = "abc&AK4-D"
print(kix_utf7_to_utf8(my_string))
Args:
bytes_imap: IMAP UTF7 string
Returns: UTF-8 string
"""
try:
str_utf7 = re.sub('&(\w{3}\-)', '+\\1', str_imap)
str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
return str_utf8
except UnicodeDecodeError:
# Error decoding, already utf-8. return original string.
return str_imap
The nametrans should be:
nametrans = lambda foldername: kix_utf7_to_utf8(foldername)
It works fine downloading the folder, but I have a problem when offlineimap reads the local folder and try to create it on the remote. It fails. Then, I created this modified version:
import subprocess # required for reading passwords
import re
def kix_utf7_to_utf8(str_imap):
"""
This function converts an IMAP_UTF-7 string object to UTF-8.
It first replaces the ampersand (&) character with plus character (+)
in the cases of UTF-7 character and then decode the string to utf-8.
If the str_imap string is already UTF-8, return it.
For example, "abc&AK4-D" is translated to "abc+AK4-D"
and then, to "abc@D"
Example code:
my_string = bytes("abc&AK4-D", 'utf-8')
print(kix_utf7_to_utf8(my_string))
Args:
bytes_imap: IMAP UTF7 bytes
Returns: UTF-8 string
"""
try:
str_utf7 = re.sub('&(\w{3}\-)', '+\\1', str_imap)
str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
return str_utf8
except UnicodeDecodeError:
# Error decoding, already utf-8. convert to utf-7
str_utf8 = str_imap.encode('utf_7').decode('utf-8')
str_utf7 = re.sub('\+(\w{3}\-)', '&\\1', str_utf8)
return str_utf7
It works fine the first run, but the second run fails again. The error message is:
ERROR: INFINITE FOLDER CREATION DETECTED! Folder 'abc®D' (repository 'Local-gmail') would be created as folder 'abc®D' (repository 'Remote-gmail'). The latter becomes 'abc&AK4-D' in return, leading to infinite folder creation cycles.
SOLUTION: 1) Do set your nametrans rules on both repositories so they lead to identical names if applied back and forth. 2) Use folderfilter settings on a repository to prevent some folders from being created on the other side.
WARNING:OfflineImap:ERROR: INFINITE FOLDER CREATION DETECTED! Folder 'abc®D' (repository 'Local-gmail') would be created as folder 'abc®D' (repository 'Remote-gmail'). The latter becomes 'abc&AK4-D' in return, leading to infinite folder creation cycles.
SOLUTION: 1) Do set your nametrans rules on both repositories so they lead to identical names if applied back and forth. 2) Use folderfilter settings on a repository to prevent some folders from being created on the other side.
Probably I need create two different functions, one for every foldersettings (Remote and Local). I think this is the way, I will continue later. You can work on this direction if you want.
Regards, kix
Since I never intend to create new folders on my local repository to be synced to a remote repository, I don't see that I will ever trigger the problem you had with your first conversion script above. Is that correct?
That first conversion script works perfectly for the example from @sudipm-mukherjee in that it searches for '&' followed by three alphanumeric characters and a dash. Is that the correct universal algorithm? I thought all ampersands had to be replaced by plus characters.
Thanks for checking @thekix. As @dnebauer mentioned, I too think creating folder locally and then syncing with remote is a very rare usecase and most of the average user will not face that problem. If you want I can keep this issue open for this usecase or close this one as I think the main problem that @dnebauer had is solved now.
Hi,
@dnebauer , about the question about conversion script, you have more info here: https://en.wikipedia.org/wiki/UTF-7 and here http://string-functions.com/encodingtable.aspx?encoding=65000&decoding=20127 As you can see, is +
three characters-
.
we need change the ampersand (&
) to plus to make this codification in the previous examples.
@sudipm-mukherjee IMO, there is no problem with offlineimap, and we provided an script and other information about this problem. IMO, is better close the bug if the problem is solved.
Finally, I can confirm that my dovecot test server returns an @
character instead of &AK4-
and offlineimap works fine.
So, if all of us agree that the problem is solved, we can close the bug.
With kix' new script I am now happily running offlineimap3. You are correct that this problem was with the pythonscript and not core offlineimap, but I suspect this solution will be useful to many users. Thank you both for helping me through the upgrade.
With kix' new script I am now happily running offlineimap3. You are correct that this problem was with the pythonscript and not core offlineimap, but I suspect this solution will be useful to many users. Thank you both for helping me through the upgrade.
I agree with @dnebauer that the script @thekix gave might be useful for other users also.
@thekix do you think it might be useful to add the modified pythonfile
in the contrib folder and use that commit to close this issue? If not I can directly close this issue.
Hi @sudipm-mukherjee
Yes, of course. I can add the file to the contrib folder. What script do you think is better (please, paste it)? About the filename, nametrans_imap_to_utf8_example.py could be fine? I will add an small header to the file and I will remove the import subprocess # required for reading passwords
. Comments are welcome.
Regards, kix
@dnebauer is an user who is using nametrans
, so I think I will leave it on @dnebauer to say which script is better.
I appreciate @sudipm-mukherjee leaving the decision to me, but it should be pretty obvious by now that I'm hardly an expert on offlineimap, python, or utf encodings. For what it's worth, though, I went with a minor variation on the first script by @thekix because I was only interested in a utf7-to-utf8 conversion, given that I never want to create local folders to be synchronised back to a remote server, and the second script can potentially return utf7.
For what it's worth, here is the script I'm now using:
import subprocess # noqa, required for reading passwords
import re
def convert_utf7_to_utf8(str_imap):
"""
This function converts an IMAP_UTF-7 string object to UTF-8.
It first replaces the ampersand (&) character with plus character (+)
in the cases of UTF-7 character and then decode the string to utf-8.
If the str_imap string is already UTF-8, return it.
For example, "abc&AK4-D" is translated to "abc+AK4-D"
and then, to "abc@D"
Example code:
my_string = "abc&AK4-D"
print(kix_utf7_to_utf8(my_string))
Args:
bytes_imap: IMAP UTF7 string
Returns: UTF-8 string
Source: https://github.com/OfflineIMAP/offlineimap3/issues/23
"""
try:
str_utf7 = re.sub(r'&(\w{3}\-)', '+\\1', str_imap)
str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
return str_utf8
except UnicodeDecodeError:
# error decoding because already utf-8, so return original string
return str_imap
You will see I added the raw string prefix r
before the re.sub pattern. This prevents linters complaining that \w
and \-
are invalid escape sequences.
Thanks again to both of you for all your help.
Thanks @sudipm-mukherjee @dnebauer !
Uploaded! kix
Thanks @thekix
I found the above unsatisfying, and wasn't sure what exactly to put where.
Using python-imap-tools
I adapted what I had for python2 in .offlineimap.py
, which seems to work for offlineimap3 and will hopefully work both ways (untested).
#!/usr/bin/python
from imap_tools.imap_utf7 import encode, decode
def from_imap_nametrans(foldername):
assert isinstance(foldername, str), str
foldername = foldername.encode('utf-8')
foldername = decode(foldername)
return foldername
def to_imap_nametrans(foldername):
assert isinstance(foldername, str), foldername
foldername = encode(foldername)
foldername = foldername.decode('utf-8')
return foldername
Then use nametrans = to_imap_nametrans
for local repos and nametrans = from_imap_nametrans
for remote repos in .offlineimaprc
ps. I find it extremely irritating that I have to convert Umlaute in foldernames with custom scripting. A lot of German users will be affected by this. As a user, I would heavily bin this into "bug".
In a recent imap_tools update, encode/decode were renamed:
from imap_tools.imap_utf7 import utf7_encode, utf7_decode
def from_imap_nametrans(foldername):
assert isinstance(foldername, str), str
foldername = foldername.encode('utf-8')
foldername = utf7_decode(foldername)
return foldername
def to_imap_nametrans(foldername):
assert isinstance(foldername, str), foldername
foldername = utf7_encode(foldername)
foldername = foldername.decode('utf-8')
return foldername
Just checking, you changed 1 of the 2 calls of encode
to utf7_encode
but left the other call unchanged:
foldername = foldername.encode('utf-8')
and you changed 1 of the 2 calls of decode
to utf7_decode
but left the other call unchanged:
foldername = foldername.decode('utf-8')
Should these not also be changed to utf7_encode
and utf7_decode
, respectively?
Using
nametrans
leads to a failure with offlineimap3 but works with offlineimap.Steps to reporduce:
1) extract the attached file and use it as
pythonfile
in [general] of .offlineimaprc. 2) Addnametrans
to your .offlineimaprc.nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')
offlineimap.zip