OfflineIMAP / offlineimap3

Read/sync your IMAP mailboxes (python3)
Other
455 stars 64 forks source link

offlineimap3: fails with "'str' object has no attribute 'decode'" #23

Open sudipm-mukherjee opened 4 years ago

sudipm-mukherjee commented 4 years ago

Using nametrans leads to a failure with offlineimap3 but works with offlineimap.

   ERROR: While attempting to sync account 'testmail'
      'str' object has no attribute 'decode'

    Traceback:
      File "/usr/lib/python3/dist-packages/offlineimap/accounts.py", line 298, in syncrunner
        self.__sync()
      File "/usr/lib/python3/dist-packages/offlineimap/accounts.py", line 374, in __sync
        remoterepos.getfolders()
      File "/usr/lib/python3/dist-packages/offlineimap/repository/IMAP.py", line 679, in getfolders
        retval.append(self.getfoldertype()(self.imapserver, name,
      File "/usr/lib/python3/dist-packages/offlineimap/folder/IMAP.py", line 53, in __init__
        super(IMAPFolder, self).__init__(name, repository)
      File "/usr/lib/python3/dist-packages/offlineimap/folder/Base.py", line 57, in __init__
        self.visiblename = repository.nametrans(name)
      File "<string>", line 1, in <lambda>

Steps to reporduce:

1) extract the attached file and use it as pythonfile in [general] of .offlineimaprc. 2) Add nametrans to your .offlineimaprc. nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')

offlineimap.zip

thekix commented 4 years ago

HI @sudipm-mukherjee

after checking offlineimap and this issue, IMO the problem is not related to offlineimap. I will try to explain the problem.

Using Python 2

offlineimap is using the mailfolders as byte objects. Then, the object is sent to the lambda function used in nametrans:

nametrans = lambda foldername: foldername.decode('imap4-utf-7').encode('utf-8')

This call is more or less:

def nametrans(byte_object_in_imap4-utf-7):
    string_in_imap4-utf-7 = byte_object_in_imap4-utf-7.decode('imap4-utf-7')
    byte_object_in_utf8 = string_in_imap4-utf-7.encode('utf-8')
    return byte_object_in_utf8

So the function receives a byte object and returns a byte object. It uses an string to change one format to other.

Using Python 3

offlineimap is using the mailfolders as (utf-8) string objects. Then, the object is sent to the lambda function used in nametrans. This function is waiting for a bytes object and then it fails in the byte_object_in_imap4-utf-7.decode('imap4-utf-7') and we get the error:

   ERROR: While attempting to sync account 'testmail'
      'str' object has no attribute 'decode'

Then?

IMO, the problem is not offlineimap, and then, we should not change offlineimap for it. In Python 3, all strings objects are already UTF-8, so we don't need convert them. If we need change the string format, we need change it in a different way in the attached offlineimap.py script.

The new script should convert an (utf-8) "string imap_utf7" to (utf-8) "string utf-8".

offlineimap is doing this conversion in some places. For example, check the function "decode_mailbox_name" and other functions like "IMAP_utf8" and "utf8_IMAP" (in the same file, next functions):

https://github.com/OfflineIMAP/offlineimap3/blob/master/offlineimap/imaputil.py#L321

I saw these functions when I migrated offlineimap to Python 3. Probably these functions are not perfect. I check them and check the functions used in IMAPClient (https://github.com/mjs/imapclient), and probably the functions in imapclient could be better. But this is other history.

Probably you can check this code: https://github.com/MarechJ/py3_imap_utf7 to use as lambda in the offlineimap.py file. Seems to be better than imapclient. I didn't test it, but probably you can use it as (including it as pythonfile; do not forget include the import subprocess if you are using the password in remotepasseval):

nametrans = lambda foldername: decode(encode(foldername))

Please, try this new option. If the new options is working and you agree with my explanation, feel free to close this bug.

Best Regards, Rodolfo (kix)

dnebauer commented 4 years ago

I'm the downstream bug reporter that @sudipm-mukherjee has been helping. Using your link I created the following pythonfile:

import binascii
# import codecs
# from base64 import b64encode, b64decode

# encoding

def modified_base64(s):
    s = s.encode('utf-16be')
    return binascii.b2a_base64(s).rstrip(b'\n=').replace(b'/', b',')
    # return b64encode(s, b"+,")#.rstrip('\n=')

def doB64(_in, r):
    if _in:
        r.append(b'&' + modified_base64(''.join(_in)) + b'-')
    del _in[:]

def encode(s: str):
    r = []
    _in = []
    for c in s:
        ordC = ord(c)
        if 0x20 <= ordC <= 0x25 or 0x27 <= ordC <= 0x7e:
            doB64(_in, r)
            r.append(c.encode())
        elif c == '&':
            doB64(_in, r)
            r.append(b'&-')
        else:
            _in.append(c)
    doB64(_in, r)
    return b''.join(r)

# decoding

def modified_unbase64(s):
    # b = b64decode(s, b"+,")
    b = binascii.a2b_base64(s.replace(b',', b'/') + b'===')
    return b.decode('utf-16be')

def decode(s: bytes):
    r = []
    decode = bytearray()
    for c in s:
        if c == ord('&') and not decode:
            decode.append(ord('&'))
        elif c == ord('-') and decode:
            if len(decode) == 1:
                r.append('&')
            else:
                ab = modified_unbase64(decode[1:])
                r.append(ab)
            decode = bytearray()
        elif decode:
            decode.append(c)
        else:
            r.append(chr(c))

    if decode:
        r.append(modified_unbase64(decode[1:]))

    bin_str = ''.join(r)
    return bin_str

# required for reading passwords

import subprocess

and used the nametrans lambda you suggested above:

nametrans = lambda foldername: decode(encode(foldername))

Clearly, however, my python-fu is not strong enough. This configuration resulted in the same error @sudipm-mukherjee reported at the top of this issue. Playing with the function names in the pythonfile shows that the "attribute 'decode'" in the error message refers to the function "decode" in the pythonfile -- if you change the name of the "decode" function in the pythonfile and nametrans call, the changed name appears in the error message.

thekix commented 4 years ago

Hi @dnebauer!

Thanks for your comment. Could you send me the foldername with problems?

Regards, kix

dnebauer commented 4 years ago

I'm not sure I understand your question. The error I reported occurs with every account I have defined, if I use the pythonfile and nametrans lambda shown above. The defined accounts are 'austdomains' and 'bigpond'. The 'austdomains' account has the mailboxes 'INBOX', 'INBOX.Archive', 'INBOX.Drafts', 'INBOX.Junk', 'INBOX.Sent', 'INBOX.spam', and 'INBOX.Trash'. The 'bigpond' account has the mailboxes 'Deleted', 'Drafts', 'INBOX', 'Junk', 'Sent', and 'Trash'. Let me know if you were after something else.

sudipm-mukherjee commented 4 years ago

I am unable to reproduce this error with the file that @dnebauer sent me. This is how I am testing: 1) Delete all the mail folders from laptop. 2) create new label "abc®D" in my test gmail account. 3) use the offlineimap.py that @dnebauer sent me 4) Use nametrans = lambda foldername: decode(encode(foldername)) in offlineimaprc 5) sync using offlineimap -o -u quiet I dont get any error, and after syncing I do see a local folder named abc&AK4-D.

Without the nametrans and all the same steps as above I get a folder with name 'abc&AK4-D'. Only difference between these two are the '

thekix commented 4 years ago

Hi,

the problem is the ampersand character. In utf-7 the character should be a plus (+), so the folder name should be abc+AK4-D instead of abc&AK4-D. Then, the folder name is converted to abc@D

I did a modified version of the conversion script:

import subprocess  # required for reading passwords
import re

def kix_utf7_to_utf8(str_imap):
    """
    This function converts an IMAP_UTF-7 string object to UTF-8.
    It first replaces the ampersand (&) character with plus character (+)
    in the cases of UTF-7 character and then decode the string to utf-8.

    If the str_imap string is already UTF-8, return it.

    For example, "abc&AK4-D" is translated to "abc+AK4-D"
    and then, to "abc@D"

    Example code:
    my_string = "abc&AK4-D"
    print(kix_utf7_to_utf8(my_string))

    Args:
        bytes_imap: IMAP UTF7 string

    Returns: UTF-8 string

    """
    try:
        str_utf7 = re.sub('&(\w{3}\-)', '+\\1', str_imap)
        str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
        return str_utf8
    except UnicodeDecodeError:
        # Error decoding, already utf-8. return original string.
        return str_imap

The nametrans should be:

nametrans = lambda foldername: kix_utf7_to_utf8(foldername)

It works fine downloading the folder, but I have a problem when offlineimap reads the local folder and try to create it on the remote. It fails. Then, I created this modified version:

import subprocess  # required for reading passwords
import re

def kix_utf7_to_utf8(str_imap):
    """
    This function converts an IMAP_UTF-7 string object to UTF-8.
    It first replaces the ampersand (&) character with plus character (+)
    in the cases of UTF-7 character and then decode the string to utf-8.

    If the str_imap string is already UTF-8, return it.

    For example, "abc&AK4-D" is translated to "abc+AK4-D"
    and then, to "abc@D"

    Example code:
    my_string = bytes("abc&AK4-D", 'utf-8')
    print(kix_utf7_to_utf8(my_string))

    Args:
        bytes_imap: IMAP UTF7 bytes

    Returns: UTF-8 string

    """
    try:
        str_utf7 = re.sub('&(\w{3}\-)', '+\\1', str_imap)
        str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
        return str_utf8
    except UnicodeDecodeError:
        # Error decoding, already utf-8. convert to utf-7
        str_utf8 = str_imap.encode('utf_7').decode('utf-8')
        str_utf7 = re.sub('\+(\w{3}\-)', '&\\1', str_utf8)
        return str_utf7

It works fine the first run, but the second run fails again. The error message is:

 ERROR: INFINITE FOLDER CREATION DETECTED! Folder 'abc®D' (repository 'Local-gmail') would be created as folder 'abc®D' (repository 'Remote-gmail'). The latter becomes 'abc&AK4-D' in return, leading to infinite folder creation cycles.
 SOLUTION: 1) Do set your nametrans rules on both repositories so they lead to identical names if applied back and forth. 2) Use folderfilter settings on a repository to prevent some folders from being created on the other side.
WARNING:OfflineImap:ERROR: INFINITE FOLDER CREATION DETECTED! Folder 'abc®D' (repository 'Local-gmail') would be created as folder 'abc®D' (repository 'Remote-gmail'). The latter becomes 'abc&AK4-D' in return, leading to infinite folder creation cycles.
 SOLUTION: 1) Do set your nametrans rules on both repositories so they lead to identical names if applied back and forth. 2) Use folderfilter settings on a repository to prevent some folders from being created on the other side.

Probably I need create two different functions, one for every foldersettings (Remote and Local). I think this is the way, I will continue later. You can work on this direction if you want.

Regards, kix

dnebauer commented 4 years ago

Since I never intend to create new folders on my local repository to be synced to a remote repository, I don't see that I will ever trigger the problem you had with your first conversion script above. Is that correct?

That first conversion script works perfectly for the example from @sudipm-mukherjee in that it searches for '&' followed by three alphanumeric characters and a dash. Is that the correct universal algorithm? I thought all ampersands had to be replaced by plus characters.

sudipm-mukherjee commented 4 years ago

Thanks for checking @thekix. As @dnebauer mentioned, I too think creating folder locally and then syncing with remote is a very rare usecase and most of the average user will not face that problem. If you want I can keep this issue open for this usecase or close this one as I think the main problem that @dnebauer had is solved now.

thekix commented 4 years ago

Hi,

@dnebauer , about the question about conversion script, you have more info here: https://en.wikipedia.org/wiki/UTF-7 and here http://string-functions.com/encodingtable.aspx?encoding=65000&decoding=20127 As you can see, is +three characters-. we need change the ampersand (&) to plus to make this codification in the previous examples.

@sudipm-mukherjee IMO, there is no problem with offlineimap, and we provided an script and other information about this problem. IMO, is better close the bug if the problem is solved.

Finally, I can confirm that my dovecot test server returns an @ character instead of &AK4- and offlineimap works fine.

So, if all of us agree that the problem is solved, we can close the bug.

dnebauer commented 4 years ago

With kix' new script I am now happily running offlineimap3. You are correct that this problem was with the pythonscript and not core offlineimap, but I suspect this solution will be useful to many users. Thank you both for helping me through the upgrade.

sudipm-mukherjee commented 4 years ago

With kix' new script I am now happily running offlineimap3. You are correct that this problem was with the pythonscript and not core offlineimap, but I suspect this solution will be useful to many users. Thank you both for helping me through the upgrade.

I agree with @dnebauer that the script @thekix gave might be useful for other users also. @thekix do you think it might be useful to add the modified pythonfile in the contrib folder and use that commit to close this issue? If not I can directly close this issue.

thekix commented 4 years ago

Hi @sudipm-mukherjee

Yes, of course. I can add the file to the contrib folder. What script do you think is better (please, paste it)? About the filename, nametrans_imap_to_utf8_example.py could be fine? I will add an small header to the file and I will remove the import subprocess # required for reading passwords. Comments are welcome.

Regards, kix

sudipm-mukherjee commented 4 years ago

@dnebauer is an user who is using nametrans, so I think I will leave it on @dnebauer to say which script is better.

dnebauer commented 4 years ago

I appreciate @sudipm-mukherjee leaving the decision to me, but it should be pretty obvious by now that I'm hardly an expert on offlineimap, python, or utf encodings. For what it's worth, though, I went with a minor variation on the first script by @thekix because I was only interested in a utf7-to-utf8 conversion, given that I never want to create local folders to be synchronised back to a remote server, and the second script can potentially return utf7.

For what it's worth, here is the script I'm now using:

import subprocess  # noqa, required for reading passwords
import re

def convert_utf7_to_utf8(str_imap):
    """
    This function converts an IMAP_UTF-7 string object to UTF-8.
    It first replaces the ampersand (&) character with plus character (+)
    in the cases of UTF-7 character and then decode the string to utf-8.

    If the str_imap string is already UTF-8, return it.

    For example, "abc&AK4-D" is translated to "abc+AK4-D"
    and then, to "abc@D"

    Example code:
    my_string = "abc&AK4-D"
    print(kix_utf7_to_utf8(my_string))

    Args:
        bytes_imap: IMAP UTF7 string

    Returns: UTF-8 string

    Source: https://github.com/OfflineIMAP/offlineimap3/issues/23

    """
    try:
        str_utf7 = re.sub(r'&(\w{3}\-)', '+\\1', str_imap)
        str_utf8 = str_utf7.encode('utf-8').decode('utf_7')
        return str_utf8
    except UnicodeDecodeError:
        # error decoding because already utf-8, so return original string
        return str_imap

You will see I added the raw string prefix r before the re.sub pattern. This prevents linters complaining that \w and \- are invalid escape sequences.

Thanks again to both of you for all your help.

thekix commented 4 years ago

Thanks @sudipm-mukherjee @dnebauer !

Uploaded! kix

sudipm-mukherjee commented 4 years ago

Thanks @thekix

hrehfeld commented 3 years ago

I found the above unsatisfying, and wasn't sure what exactly to put where.

Using python-imap-tools I adapted what I had for python2 in .offlineimap.py, which seems to work for offlineimap3 and will hopefully work both ways (untested).

#!/usr/bin/python
from imap_tools.imap_utf7 import encode, decode

def from_imap_nametrans(foldername):
    assert isinstance(foldername, str), str
    foldername = foldername.encode('utf-8')
    foldername = decode(foldername)
    return foldername

def to_imap_nametrans(foldername):
    assert isinstance(foldername, str), foldername
    foldername = encode(foldername)
    foldername = foldername.decode('utf-8')
    return foldername

Then use nametrans = to_imap_nametrans for local repos and nametrans = from_imap_nametrans for remote repos in .offlineimaprc

ps. I find it extremely irritating that I have to convert Umlaute in foldernames with custom scripting. A lot of German users will be affected by this. As a user, I would heavily bin this into "bug".

hrehfeld commented 3 months ago

In a recent imap_tools update, encode/decode were renamed:

from imap_tools.imap_utf7 import utf7_encode, utf7_decode

def from_imap_nametrans(foldername):
    assert isinstance(foldername, str), str
    foldername = foldername.encode('utf-8')
    foldername = utf7_decode(foldername)
    return foldername

def to_imap_nametrans(foldername):
    assert isinstance(foldername, str), foldername
    foldername = utf7_encode(foldername)
    foldername = foldername.decode('utf-8')
    return foldername
dnebauer commented 3 months ago

Just checking, you changed 1 of the 2 calls of encode to utf7_encode but left the other call unchanged:

    foldername = foldername.encode('utf-8')

and you changed 1 of the 2 calls of decode to utf7_decode but left the other call unchanged:

    foldername = foldername.decode('utf-8')

Should these not also be changed to utf7_encode and utf7_decode, respectively?