OfflineIMAP / offlineimap

Read/sync your IMAP mailboxes (python2) [LEGACY: move to offlineimap3]
http://www.offlineimap.org
Other
1.78k stars 360 forks source link

Maildir storage of folders violates the standard #589

Open hydrargyrum opened 5 years ago

hydrargyrum commented 5 years ago

General informations

Steps to reproduce the error

When syncing mails from an IMAP server (with folders and subfolders) to a local Maildir folder, I get local physical folders named (for example): "INBOX", "Archives", "Trash".

The Maildir standard states otherwise, the physical folders should respectively be: the root of the physical Maildir (for INBOX), ".Archives" and ".Trash".

From courier MTA documentation:

Folders are additional subdirectories in the maildir whose names begin with a period: such as .Drafts or .Sent.

From dovecot MTA documentation:

  • ~/Maildir/new, ~/Maildir/cur and ~/Maildir/tmp directories contain the messages for INBOX. The tmp directory is used during delivery, new messages arrive in new and read shall be moved to cur by the clients.
  • ~/Maildir/.folder/ is a mailbox folder

From Python's standard library's mailbox module:

Any subdirectory of the main mailbox is considered a folder if '.' is the first character in its name. Folder names are represented by Maildir without the leading '.'.

nicolas33 commented 5 years ago

What are the names of these folders as showed by IMAP from the remote server? offlineimap does not rename the folders when syncing.

However, there's the nametrans configuration option to rename the foldernames locally. It should not be hard to make a mapping for theses folders.

hydrargyrum commented 5 years ago

What are the names of these folders as showed by IMAP from the remote server?

I don't know, can I enable some debug logs to see it?

offlineimap does not rename the folders when syncing.

I think this is the problem. You need to distinguish the logical name from the physical name. IMAP returns you the logical name, and when you store on disk, you must use a physical name, a kind of encoding.

From the various link I sent, when the logical folder name is "foo", the physical (on disk) folder name should be ".foo". And the apps/libraries will interpret the physical name ".foo" as the logical "foo" name.

That's the behavior the Python lib implements: when calling Maildir.list_folders() it only checks folder names starting with a dot (physical), and strips the trailing dot when returning the folder name (logical).

You're using the logical name as physical folder name, but then you don't respect the standard specified by the various links.

I'm writing an app using this code:

import mailbox
box = mailbox.Maildir('/path/to/maildir')
print(box.list_folders())

and when I use offlineimap, I get no folders, because offlineimap did not use the naming expected by Python.

I mean, offlineimap says it uses Maildir but it doesn't, it's a custom format ressembling Maildir, but it isn't Maildir. Either the bug should be fixed (using an option if you want to ensure compatibility with the old, bugged behavior) or Maildir shouldn't be mentioned anymore in the documentation because it's misleading.

nicolas33 commented 5 years ago

You need to distinguish the logical name from the physical name.

The decision of the "logical name" is left to the user.

hydrargyrum commented 5 years ago

Well, this shouldn't be the case, the maildir standard dictates that the physical name of a folder MUST start with a ".", and all 3 docs I linked do agree on this point. This is maildir, offlineimap doesn't follow this, so offlineimap doesn't do maildir.

If you want to keep compatibility, you can make it optional, but offlineimap has to implement in some way the behavior with a starting dot if you want to be compliant with the standard.

Also, the "INBOX" shouldn't be like a subfolder, it should be the main folder. It should look like this (Maildir standard):

new/        <- for INBOX
cur/        <- for INBOX    
tmp/        <- for INBOX
.foo/new    <- for "foo"
.foo/cur    <- for "foo"
.foo/tmp    <- for "foo"

instead of this (current offlineimap behavior):

INBOX/new/   <- for INBOX
INBOX/cur/   <- for INBOX
INBOX/tmp/   <- for INBOX
foo/new      <- for "foo"
foo/cur      <- for "foo"
foo/tmp      <- for "foo"

If you don't care about not being compliant with the maildir standard, then the docs must remove any mention to maildir.

nicolas33 commented 5 years ago

You're referring to the maildir++ as defined by Courier. The maildir was first introduced by qmail which offlineimap is compliant with.

hydrargyrum commented 5 years ago

The original specification from qmail doesn't support any subfolders at all (since it's just an MTA) so it's wrong to say offlineimap is compliant with it regarding folders.

If you want to use folders, qmail is NOT the way to go, because it doesn't specify anything. On the other hand, Courier and Dovecot are IMAP servers, so they do support folders, and their physical storage is the so-called "maildir++".

Also, I invite you to test the Python code snippet I posted with offlineimap's ouput files and folders. Even Python's own implementation disagrees with offlineimap.

(On another note, even for the main folder (INBOX), you don't seem to follow qmail's spec, because the 3 folders new/cur/tmp are not at the root of offlineimap's maildir but inside a INBOX folder, which is wrong)

nicolas33 commented 5 years ago

These are all configuration options. We don't provide a default configuration file but the "minimal" one. I'm fine with adding a new default configuration file if someone cares enough to write it, though.

hydrargyrum commented 5 years ago

How could this be achieved properly? Is nametrans enough to take care of everything or should deeper modifications be done?

nicolas33 commented 5 years ago

You might need to adjust "sep", too.

BTW, I tend to think by reading our nametrans documentation that most users don't want maildir++.

hydrargyrum commented 5 years ago

You might need to adjust "sep", too.

Ok, I will test this.

BTW, I tend to think by reading our nametrans documentation that most users don't want maildir++.

Why? If maildir++ isn't used, other apps won't be able to read the mails (for example, those using Python's standard library)

nicolas33 commented 5 years ago

Why? If maildir++ isn't used, other apps won't be able to read the mails (for example, those using Python's standard library)

I don't know why. I notice that this issue is the first about this topic and our doc about nametrans in offlineimap.conf provide samples to remove the dots, for years.

There's no distinction between "physical" and "logical" names in IMAP so I think that most softwares do not require the leading dot in the wild. Also, I'd say most users use email readers and IMAP servers on top of their maildirs. I don't think anyone requires the leading dot.

"maildir" is not a real standard. It's more a consensus to follow some patterns as applied by some projects.

pierrebeaucamp commented 5 years ago

Fyi, I ran into the same problem and could resolve it with the following config:

[Repository Local]
nametrans = lambda folder: 'INBOX' if folder == '' else re.sub('^\.', '', folder)

[Repository Remote]
nametrans = lambda folder: '' if folder == 'INBOX' else '.' + folder
lsh-0 commented 5 years ago

hi guys, I'm being affected by this issue right now, so thanks @hydrargyrum for bringing it up.

in my case I have a backup provided my previous host that I'm trying to sync new mail to. It appears to be in the standard maildir++ format with root cur, new and tmp directories for INBOX and subfolders prefixed with a dot .subfolder/cur etc.

offlineimap, syncing from the server the backup was made on, is ignoring the structure and creating INBOX and subfolder directories and attempting to do an unnecessary download of hundreds of thousands of emails.

This seems like a strange state of affairs and @hydrargyrum is making a good case, but I just want to sync the few dozen emails that have been sent and received since the backup was made. Is the best solution to do this with nametrans?

update: after fiddling with nametrans and getting it right (thanks @pierrebeaucamp ) it's not recognising the contents of the backup and downloading it all again.

wadih commented 2 years ago

Evolution doesn't see the subfolders if the leading dot isn't there.

techdragon commented 1 year ago

Just adding another voice to the pile. I decided to use and setup and mucked around with macOS ssl certs and homebrew, and outdated homebrew packages to use offlineimap because of all the tools I had found to do this sort of mail backup/sync ... it seemed obvious that it would play nicely with python's maildir library out of the box... why wouldn't it!

So the fact it doesn't, and the fact the configuration to make it so it does isn't mentioned in the docs, is extremely counterintuitive. I'm glad I was able to rename the already downloaded folders after getting the config working, but that was yet another undocumented leap...

I really think this situation should be fixed and the python maildir module behaviour should either be the default, or this needs to be called out extremely clearly in the documentation, and the example config type section, where you have the IMAP/maildir choice.

chris001 commented 1 year ago

I'd vote in favor of using a maildir library module to make it easy to adhere to standard maildir format, if any, so that other apps are able to read the maildir. For example, when you want to run offlineimap directly on a new mail server, to download a mail account from an old mail server. After the download is complete, the new mail server will be happy for the maildir to be in a format it expects. When the maildir format isn't exactly correct, the new server can't see some of the data, so the user experiences data loss, avoidable when a the way offlineimap would write and structure the maildir is the same as Cyrus or Dovecot does the same.