cyrusimap / cyrus-imapd

Cyrus IMAP is an email, contacts and calendar server
http://cyrusimap.org
Other
536 stars 147 forks source link

3.0.7: deliver to mailbox with space in name fails #3870

Open jikamens opened 2 years ago

jikamens commented 2 years ago

I have a script which does this:

/usr/lib/cyrus-imapd/deliver -a jik -m 'user.jik.Sent\ Items' -q

This worked fine until cyrus imapd 2.4.17. Under 3.0.7, however, it gives:

+user.jik.Sent\ Items: Mailbox does not exist

It doesn't work if you omit slash either:

+user.jik.Sent Items: Bad protocol

If I try this:

/usr/lib/cyrus-imapd/deliver -a jik -m 'Sent\ Items' -q jik

then it delivers to my inbox instead of my Sent Items folder.

I'm at a loss for how to deliver a message to a mailbox with a space in its name.

Related: https://github.com/cyrusimap/cyrus-imapd/issues/230

elliefm commented 2 years ago

I don't really know deliver, so I'm looking at its code (on master, not 3.0, but it doesn't seem to have substantially changed in that time).

It looks like the -m argument is just a convenience for plus-addressing, which is to say, I think that:

/path/to/deliver -m Foo someuser@example.com

is the same as:

/path/to/deliver someuser+Foo@example.com

Now, that clearly couldn't work if the mailbox name has a space in it, because then the email address would have a space in it...

Looking at our tests, there is one test that uses a mailbox name with a space in it, and it looks like it replaces the space with an underscore (_) when calling deliver.

So, what happens if you try something like:

/usr/lib/cyrus-imapd/deliver -a jik -m Sent_Items -q jik
jikamens commented 2 years ago
/usr/lib/cyrus-imapd/deliver -a jik -m Sent_Items -q jik

Goes into Inbox, not Sent Items.

elliefm commented 2 years ago

A very similar question has just come up on the info mailing list here: https://cyrus.topicbox.com/groups/info/T35d002bd4cf95e5a-Ma423db5c47905548b46a745d

I was wondering if this might be you, but their examples are different from yours.

It's interesting to note in their examples, they are passing their -m argument in the form "INBOX/Foo" -- which works for mailboxes without a space, but not with. But they do not try the underscore substitution.

So, I wonder what happens if you combine the two. What about something like:

/usr/lib/cyrus-imapd/deliver -a jik -m 'INBOX.Sent_Items' -q jik

(I assume you do not have unixhierarchysep: on, based on your examples using dots rather than slashes.)

I also wonder if the value of "altnamespace" makes a difference to the correct invocation.

I'll send a similar message to the list, linking this issue.

jikamens commented 2 years ago

The mailing list isn't me.

Your suggestions don't work for me.

jikamens commented 2 years ago

Also, perhaps I am confused, but.. couldn't you be reproducing and trying out these suggested fixes yourself rather than asking me to try them out? Wouldn't that be faster/more efficient? I don't mind helping, but I don't understand why I'm being asked to reproduce something that I suspect is a problem for everyone, not just for me. (Or have you already tried it yourself and it works for you?)

elliefm commented 2 years ago

I have never used deliver in the real world in my life (I don't sysadmin Cyrus; and in my sysadmin days, we used a different mail system). So we're starting at a point where you certainly know more about how deliver is usually used than I do.

To reproduce this myself, I would write a test case in Cassandane. I looked in Cassandane last week, and it turned out we already have a test that looks like it should exercise this -- that's how I learned about the underscore thing.

So at the moment the place I'm at is trying to figure out why, if the underscore thing works in the test, why does it not work in practice for you. In a sense, it appears to "work for me", and I'm needing further data from yourself (and the other reporter) to figure out what the difference is.

I wonder: do you have altnamespace enabled or not? I don't know if I expect that to make a difference, but I don't have any better insights yet. At least, knowing a bit more about your environment, I could write a new test that's a close match for your environment, and see what happens.

Oh, and here's a potentially thorny one: do you have improved_mboxlist_sort enabled or not?

Binarus commented 2 years ago

It was me who brought the issue up on the mailing list, because it begins to become a show stopper here. I apologize that I haven't found this issue on GitHub during my research; sorry for the "cross-post".

At first, I'd like to confirm the issue, although our situation, tests and results are slightly different from those of the OP:

I have described our issue in detail on the mailing list: https://cyrus.topicbox.com/groups/info/T35d002bd4cf95e5a-Ma423db5c47905548b46a745d/how-to-use-cyrdeliver-when-the-target-mailboxs-name-contains-space-characters

Would it be better to post the issue in full length here again, or is the above link sufficient?

@elliefm

I tend to support the OP's point of view regarding your own testing. I understand that not everybody from your team can know all tools and all parts of the code. However, you surely know the right people, and you or other team members really should reproduce the problem yourself. How could you solve it otherwise? By changing code and guessing that it works? Or by writing a message, waiting several hours or days until the OP has responded, then having the OP wait several hours until you find the time to respond, and so on? I can't imagine a thing which would be less efficient.

The problem has been stated with a clear description. The problem is reliably reproducible without any effort; it is not a thing which only occurs sometimes or depends on esoteric settings which only a few people use, or on the configuration of the O/S. Reproducing is really a no-brainer and will cost only several minutes. Just log in to a system where Cyrus imapd is installed (obviously, younger than 2.4.x, preferably 3.0.x), create a subfolder with space characters in its name and run the example command lines we have given.

In other words, I don't know which further data from us you could need. We have described in detail what to do to reproduce the issue, and I am more than confident that you will encounter the very same problem immediately as soon as you try, regardless of whether or not you have the exact same configuration as we have. I am nearly sure that the issue exists in all possible configurations and with all possible combinations of settings (as long as you don't turn off the IMAP server of course :-)).

If I am right, you have the needed basis for your investigations. Even if I am wrong and it works for you, that would be an important finding, because we then would know that the issue, in contrast to my current conviction, is indeed due to configuration. In this case, you could publish your configuration, and we could alter it step by step in our testing environments until the problem occurs again, which would be the point where we would have found the culprit.

In every case, I'll try to look into the test you have mentioned. Could you please give us a hint what Cassandane is and where we can find the test you mentioned in the sources?

Regarding possible solutions and debugging:

It seems that we aren't completely lost. The OP states that it worked in 2.4.17. So why not compare the code of 2.4.17 with that of 3.0.7 (the OP's version) or 3.0.8 (our version)? I have the feeling that this would be the fastest way to find the problem. Once again, probably your team are the people who can do that most efficiently. I am willing to help, but I am afraid that I don't have the right development tools at hands; in contrast, source code management and tracking changes is probably a daily business for your team.

If you need further assistance or can't reproduce the problem, please let us know ; the matter is becoming urgent here.

Finally, thank you very much for bringing us Cyrus imapd! It is still the only IMAP server software which fits all our needs.

Best regards,

Binarus

Binarus commented 2 years ago

I wonder: do you have altnamespace enabled or not? I don't know if I expect that to make a difference, but I don't have any better insights yet. At least, knowing a bit more about your environment, I could write a new test that's a close match for your environment, and see what happens.

We have altnamespace: yes in imapd.conf. Motivated by your comment, we conducted our tests again with altnamespace: no. It didn't make a difference.

Oh, and here's a potentially thorny one: do you have improved_mboxlist_sort enabled or not?

I have scanned our conf files for that setting, but haven't found it. I also haven't found it in the man pages. From this page, I got the impression that it has been introduced in version 3.3, while the OP and we are on version 3.0.x. Therefore, the best I could say is that this setting either does not exist in our versions, or that it is at its default value (whatever that may be).

Best regards,

Binarus

elliefm commented 2 years ago

The improved_mboxlist_sort option has existed since 2.3.17 (maybe even earlier, it's hard to follow development history prior to then because of the branching strategy that was used back then). The issue you found is tagged 3.3 because that was the version number of the master branch at the time it was opened.

Here the current documentation for improved_mboxlist_sort (from imapd.conf(5)):

improved_mboxlist_sort: 0

If enabled, a special comparator will be used which will correctly sort mailbox names that contain characters such as ' ' and '-'.

Note that this option SHOULD NOT be changed on a live system. The mailboxes database should be dumped (ctl_mboxlist) before the option is changed, removed, and then undumped after changing the option. When not using flat files for the subscriptions databases the same has to be done (cyr_dbtool) for each subscription database See improved_mboxlist_sort.html.

I suspect that knowing that you do not have this option configured is going to be very helpful, thanks. Our tests mostly assume this option is enabled, and having it disabled is known to cause various problems where mailbox names contain spaces.

The test that delivers to a mailbox containing a space, by replacing the space with an underscore, is this one. It uses improved_mboxlist_sort: yes: https://github.com/cyrusimap/cyrus-imapd/blob/b097a08343ede58ff7f4c14400f0d93ac6fc85f1/cassandane/Cassandane/Cyrus/Delivery.pm#L105-L129

I'm about to add some tests that use improved_mboxlist_sort: no (the default) to see if that reproduces it. I have a hunch it might, and I have another hunch about what to try next if it does not...

I don't suppose the user with the Health Reports mailbox also happens to have a mailbox called just Health or Health-somethingelse at the same level of hierarchy?

I don't suppose the user with the Sent Items mailbox also happens to have a Sent one?

elliefm commented 2 years ago

Okay, this is interesting, my new test passes too -- for cyrus master. The deliver code hasn't changed in any substantial way since 3.0, so I would expect it to behave the same, unless the problem is some underlying API thing that has been fixed already? Hmm.

Building 3.0.7 (exactly) now, and will run the same new test once it finishes building...

elliefm commented 2 years ago

Nope, still unable to reproduce it, message is successfully delivered to the 'cassandane' user's 'Health Reports' mailbox.

Here's the exact invocation of deliver as logged by the test:

[16406] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0301450101/conf/imapd.conf" "cassandane+Health_Reports"

So that's just using plus addressing directly, rather than the -m argument.

What if I get it to use a '-m' argument...

[17482] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0308030101/conf/imapd.conf" "-m" "Health Reports" "cassandane"

"cassandane+Health Reports: Bad protocol"

Okay, that's expected due to the literal space...

[17955] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0310360101/conf/imapd.conf" "-m" "Health_Reports" "cassandane"

That worked -- without "INBOX.", with underscore

[18216] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0311570101/conf/imapd.conf" "-m" "INBOX.Health_Reports" "cassandane"

But with "INBOX." in the -m parameter it didn't work. For what it's worth this new test I'm tinkering with has altnamespace: no, which I think might make a difference here.

@Binarus I went back and re-read your original report from the mailing list and noticed that you explicitly mentioned having both a "Health" and "Health Reports" mailbox, so I've been testing with the same.

elliefm commented 2 years ago

Trying again with altnamespace: yes, and the "Health" and "Health Reports" folders both explicitly subfolders of INBOX, rather than being the user's top level mailboxes.

[21352] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0329400101/conf/imapd.conf" "-m" "INBOX.Health_Reports" "cassandane"

That one worked

[21191] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0328580101/conf/imapd.conf" "cassandane+INBOX.Health_Reports"

As did that one...

[21103] =====> Instance::_fork_command[1784] Running: "/dev/shm/cyrus/main/sbin/deliver" "-C" "/dev/shm/cass/0328460101/conf/imapd.conf" "cassandane+Health_Reports"

This one didn't work, as expected, because in this test there is no top level "Health Reports" mailbox under the cassandane user, it's a subfolder of INBOX.

So after all that I have no new insight. It still seems to work okay for me, even on 3.0.7, even with improved_mboxlist_sort: no and altnamespace: yes

I'm assuming unixhierarchysep isn't relevant here, since there's a report of the problem for each case. I wonder what other configuration item the two of you have in common that the tests do not?

Oh here's a commonality -- in both your reports you're using the '-a auth-id' parameter to deliver, but my tests are not. Is that the difference?

elliefm commented 2 years ago

Nope, still working. Tried with both '-a admin' (where "admin" is one of the names listed under "admins:") and '-a cassandane` (i.e. auth as the destination account) for both variants of altnamespace, and all four combos successfully delivered to their respective "Health Reports" folder...

jikamens commented 2 years ago

Is there debug logging I can capture which might help? Maybe ltrace or strace output?

elliefm commented 2 years ago

So as I currently understand it:

Is there debug logging I can capture which might help? Maybe ltrace or strace output?

That's a good question, and I was just thinking of asking for strace output, but after some stuffing around I managed to get the test suite to run deliver under strace... and I didn't see anything in mine that made me want to see yours. So it's funny you should mention it at the same time.

I wasn't previously familiar with ltrace, but I've just obtained and tried it out, and that looks promising! If you don't mind, can you both please try running deliver with an underscore replacing the space, under ltrace? Thanks :)

Binarus commented 2 years ago

@elliefm Thank you very much for your thorough tests and your time!

First of all, a clarification / confirmation: I first had only the subfolder Health Reports in the INBOX. Then I started testing and encountered the problem. At that point in time, I did not know the cause of the problem, but had a notion that it was the space character. Therefore, I decided to create another folder in the INBOX whose name didn't contain a space character. I chose the name Health, which was pure random (and probably quite stupid, because it worries people; I really should have chosen another name).

In the meantime, I have renamed the Health subfolder to Test, and have repeated the tests I have described here and on the list. The result was that renaming the folder didn't make any difference.

Regarding the -a option: I didn't try to leave it away completely, because no mailbox on our servers has the p permission set for anyone; instead, that permission is only set for the administrator user (cyrus) and the owner of the mailbox. Using -a <mailbox_owner> (instead of -a cyrus) did not make a difference. Thank you very much for reporting your results with leaving it away at all.

Thank you very much also for your hints regarding improved_mboxlist_sort. For unknown reasons, I had not found it yet in man imapd.conf. As the next step, I will enable that option, repeat the tests and report back. To do this, I'll have to wait for the weekend, though, because I'd like to take down that machine before and take a full backup. The manual states that we must dump databases before enabling the option. Since I've never done this, I am afraid that I'll damage something when trying it - hence the backup. If you could shortly layout the procedure which must be followed to enable that option, this probably would save a lot of time :-)

Regarding the + addressing and underscore replacement:

+ addressing does not work here with when the subfolder has space characters in its name, and the underscore replacement method is eventually misunderstood. First, let's look into some results:

cyrdeliver -a cyrus "user1+INBOX/Test" < testmessage This one worked. There was no error message, and the message has been delivered to the Test folder.

cyrdeliver -a cyrus "user1+INBOX/Health Reports" < testmessage This one did not work. There was the "Bad Protocol" error, and the message has not been delivered.

cyrdeliver -a cyrus "user1+INBOX/Health_Reports" < testmessage This one did work, but not as expected. There was no error (or other) message, but the message has been delivered to INBOX (instead of INBOX/Health Reports).

I believe that the results of the tests are misunderstood, and the underscore does not have any special meaning. To confirm, I have conducted another test:

cyrdeliver -a cyrus "user1+INBOX/NonExistingFolder" < testmessage

The point here is that NonExistingFolder of course does not exist in INBOX. However, the message has still been delivered to INBOX.

I believe that replacing the space by an underscore is meaningless. You could replace it by any other character (except the space character and other "forbidden" characters), and the outcome would be the same. You could even replace the whole folder name (here: Health Reports) by something completely different (e.g. NonExistentFolder), and the outcome would still be the same: The message would be delivered to INBOX.

Looking at it from another perspective: Replacing the space character by an underscore does nothing more than turning the folder name into something which is "syntactically" correct (because it doesn't contain "forbidden" characters any more) and therefore does not cause an error message. But this new, altered name just does not exist, and therefore, the message is delivered to INBOX; this seems to be a behavior of + addressing. In other words, the underscore has absolutely no special meaning (at least here, still having the new sorting turned off).

Again in other words: The fact that space is replaced by underscore in your test is pure coincidence. The original author could have used any other character instead of the underscore (except "forbidden" characters).

Furthermore, I haven't seen any hints about underscore replacement in the IMAP RFCs. And by the way, it would immediately cause another problem: What if we have a subfolder with an underscore in its real name?

Finally, I have looked briefly into the differences between 2.4.17 (which has been reported to work as expected) and 3.0.8 (our version). Of course, during those 30 minutes, I didn't get a clue how the code actually works, but I might have found the breaking changes. I don't want to clutter this post too much, so I am showing the difference which I am considering the actual cause of the problem. In lmtpengine.c:

In 2.4.17:

struct address_data {
    char *all;      /* storage for entire RCPT TO addr -- MUST be freed */
    char *rcpt;     /* storage for user[+mbox][@domain] -- MUST be freed */
    char *user;     /* pointer to user part of rcpt -- DO NOT be free */
    char *domain;   /* pointer to domain part of rcpt -- DO NOT be free */
    char *mailbox;  /* pointer to mailbox part of rcpt -- DO NOT be free */
    int ignorequota;
    int status;
};
...

In 3.0.8:

struct address_data {
    mbname_t *mbname;
    int ignorequota;
    int status;
};

The rest of the code in lmtpengine.c and lmtpd.c consequently massively differs at every place where (parts of) the recipient, the target mailbox or subfolders are parsed, for example in functions msg_getrcpt, msg_getrcptall and process_recipient. Actual delivery seems to be effected by the call to func -> deliver in lmtpengine.c. I haven't investigated yet where that function is, but I assume it is deliver_remote, deliver, deliver_local or deliver_mailbox from lmtpd.c. The latter three also differ massively between 2.4.17 and 3.0.8, probably due to the different data structures.

Do you see a chance that your team could look into this? There might be errors in the new parsing or delivery functions.

Thanks again, and best regards,

Binarus

jikamens commented 2 years ago

I ran the ltrace with three cases: backslash space in the folder name, space in the folder name, and underscore in the folder name. None of them work, and the ltrace output doesn't look particularly useful.

As an alternative I ran strace on the lmtpd process on the server for each of these three cases as well, and captured that output as well.

See attached zip file with all the output files. output_files.zip

Binarus commented 2 years ago
  • replacing the space with an underscore might result in "Mailbox does not exist", or might result in the message being delivered to Inbox instead, still unclear what the difference is there too

In our case, the situation is as follows:

TEST SERIES 1 - WORKING

cyrdeliver -a cyrus -m "user/user1/INBOX/Test" < testmessage
cyrdeliver -a cyrus -m "INBOX/Test" user1 < testmessage
cyrdeliver -a cyrus "user1+INBOX/Test" < testmessage

All three variants are working as expected. There is no error message, and the message gets delivered into the Test subfolder.

TEST SERIES 2 - NOT WORKING

cyrdeliver -a cyrus -m "user/user1/INBOX/Health Reports" < testmessage
cyrdeliver -a cyrus -m "INBOX/Health Reports" user1 < testmessage
cyrdeliver -a cyrus "user1+INBOX/Health Reports" < testmessage

All three variants do not work. There is the "Bad Protocol" error, and the message does not get delivered anywhere.

TEST SERIES 3 - NOT WORKING OR DOING THE WRONG THING

cyrdeliver -a cyrus -m "user/user1/INBOX/Health_Reports" < testmessage This does not work. There is the "Mailbox does not exist" error, and the message does not get delivered anywhere.

cyrdeliver -a cyrus -m "INBOX/Health_Reports" user1 < testmessage This does the wrong thing. There is no error message, but the message is delivered to INBOX. The correct behavior would be to not deliver the message at all (because the target mailbox does not exist) and to raise the "Mailbox does not exist" error.

cyrdeliver -a cyrus "user1+INBOX/Health_Reports" < testmessage This is like the previous one. It does the wrong thing. There is no error message, but the message is delivered to INBOX. The correct behavior would be to not deliver the message at all (because the target mailbox does not exist) and to raise the "Mailbox does not exist" error.

TEST SERIES 4 (Interesting finding)

cyrdeliver -a cyrus -m "INBOX/non/existing/folder" user1 < testmessage
cyrdeliver -a cyrus -m "INBOX/ArbitraryRandomText" user1 < testmessage
cyrdeliver -a cyrus "user1+INBOX/non/existing/folder" < testmessage
cyrdeliver -a cyrus "user1+INBOX/ArbitraryRandomText" < testmessage

In the above examples, the respective subfolders don't exist, which is the point of the test. The outcome is the same in all four cases: There is no error message, and the message gets delivered to INBOX. In other words, as long as the user is given explicitly (either as separate command line argument or as first part in the + addressing), I can write whatever nonsense I am in the mood to after INBOX/, and the message will get delivered without an error being raised into INBOX. This is really strange.

In contrast:

cyrdeliver -a cyrus -m "user/user1/INBOX/SomeNonsense" < testmessage

leads to the "Mailbox does not exist" error, and the message does not get delivered.

A summary of our findings:

Regards, and thanks again,

Binarus

elliefm commented 2 years ago

Looking at it from another perspective: Replacing the space character by an underscore does nothing more than turning the folder name into something which is "syntactically" correct (because it doesn't contain "forbidden" characters any more) and therefore does not cause an error message. But this new, altered name just does not exist, and therefore, the message is delivered to INBOX; this seems to be a behavior of + addressing. In other words, the underscore has absolutely no special meaning (at least here, still having the new sorting turned off).

Again in other words: The fact that space is replaced by underscore in your test is pure coincidence. The original author could have used any other character instead of the underscore (except "forbidden" characters).

I don't have time to look into this deeply right this second, but these couple of comments have given me a new hunch, and at a superficial glance I'm inclined to think my hunch is correct (but I have not yet tested it...):

lmtp_fuzzy_mailbox_match: 0 If enabled, and the mailbox specified in the detail part of the recipient (everything after the ‘+’) does not exist, lmtpd will try to find the closest match (ignoring case, ignoring whitespace, falling back to parent) to the specified mailbox name.

I bet if you turn this option on, the "underscore substitution" will start to work. I hadn't realised the tests I was running have this option enabled, but I've just checked and they do. I had been assuming the underscore was "the correct way to work around the space, as known by the tester who wrote the test", but I guess they were just doing that cause they knew fuzzy was on, and knew it would work...

The reason the plain spaces don't work in 3.0+ is because deliver relies entirely on plus addressing (even if you use the -m mailbox argument, it is just translated into a plus address internally), which means there can't be a space character in the target mailbox name because it wouldn't be legal in an email address. So I guess the short version is that to deliver to a mailbox with a space in the name, you'll need the fuzzy matching enabled, and you'll need to mangle the name in some way (such as with an underscore)

I guess maybe 2.4 deliver did things differently, and it was changed at some point (I don't know why or when, I could look at the history but have not yet).

I hope to have some more time to follow up on this properly later this week, thank you both for your help so far!

jikamens commented 2 years ago

With lmtp_fuzzy_mailbox_match: 1 in /etc/imapd.conf, after restarting everything:

# echo "Foo: Bar" | /usr/lib/cyrus-imapd/deliver -a jik -m "user.jik.Sent_Items" -q
+user.jik.Sent_Items: Mailbox does not exist

Also tried it with "SentItems" instead of "Sent_Items" and got the same response.

Just to be clear: are we in agreement that if indeed there is a way to get this to work with fuzzy matching, then it is still a bug, i.e., not acceptable, that it is impossible to deliver messages to mailboxes with spaces in their names unless fuzzy matching is enabled, and that this is a significant regression? I don't know what the point of this "fuzzy matching" thing is, but I don't want to enable it, and I should still be able to deliver messages to mailboxes with spaces in their names.

Binarus commented 2 years ago

@elliefm

Thank you very much for your research!

I bet if you turn this option on, the "underscore substitution" will start to work. During the weekend, I didn't have time to play around with the new sorting option as promised, but I was able to read a bit of the source code and also noticed lmtp_fuzzy_mailbox_match.

I confirm that turning on this option makes a difference. The command lines

cyrdeliver -a cyrus "user1+INBOX/Health_Reports" < testmessage
cyrdeliver -a cyrus "INBOX/Health_Reports" user1 < testmessage

both finally made the message appear in INBOX\Health Reports with that option turned on.

But once again, the underscore does not have any special meaning. For example, if using HealthAReports instead of Health_Reports (i.e. replacing _ by A) in the command lines above, the result is exactly the same.

Furthermore:

The reason the plain spaces don't work in 3.0+ is because deliver relies entirely on plus addressing (even if you use the -m mailbox argument, it is just translated into a plus address internally)

This does not seem to be always the case. Please consider the following command line:

cyrdeliver -a cyrus -m "user/user1/INBOX/Health_Reports" < testmessage

This command line rises the "Mailbox does not exist" error, and the message is not delivered anywhere. Obviously, the target mailbox name is not turned into the +-addressed form here.

This is confirmed by @jikamens's findings, who used this form of command line. This also confirms my findings which I've described previously: Things work differently, depending on whether the user id is given explicitly (in + addressing or as a separate command line argument) or implicitly (as part of the mailbox name).

Just to be clear: are we in agreement that if indeed there is a way to get this to work with fuzzy matching, then it is still a bug, i.e., not acceptable, that it is impossible to deliver messages to mailboxes with spaces in their names unless fuzzy matching is enabled, and that this is a significant regression? I don't know what the point of this "fuzzy matching" thing is, but I don't want to enable it, and I should still be able to deliver messages to mailboxes with spaces in their names.

I agree 100%. Fuzzy matching immediately leads to practical problems which make the whole thing even worse. Just imagine that there are three folders in INBOX: Health Reports, Health_Reports and HealthAReports.

The reason the plain spaces don't work in 3.0+ is because deliver relies entirely on plus addressing [...], which means there can't be a space character in the target mailbox name because it wouldn't be legal in an email address.

For that reason, the idea of internally converting each target userid / mailbox combination to +-addressed form is wrong in the first place. IMAP RFCs don't forbid mailboxes (or subfolders) with space characters in their name, so trying to represent those names by something which is formally an email address is impossible and never should be tried.

Apart from that, it could very well be that there are length restrictions for +-addresses (or parts of them) which do not apply to IMAP mailboxes. This is just a wild guess - I haven't read the respective RFCs. Do we have a problem here? What if I have a mailbox / folder name which is longer than the maximum length of an email address? But let's set this apart for the moment and concentrate on the space character problem ...

Excursus (which some people might find interesting, but which distracts us from the problem with the space character):

Interestingly enough, things which I would have considered more problematic work. I have conducted Tests with non-ASCII characters in the mailbox / subfolder name, in this case German umlauts:

cyrdeliver -a cyrus -m "INBOX/Öffentlich" user1 < testmessage

This raised the "Bad Protocol" error, although the subfolder Öffentlich existed and fuzzy addressing was still turned on. I know that non-ASCII character can't be in IMAP folder names, but I would have expected that fuzzy addressing would heal the mistake. But hey, representation of non-ASCII characters in IMAP mailbox names is documented, so I replaced the Ö by its IMAP-UTF-7 representation:

cyrdeliver -a cyrus -m "INBOX/&ANY-ffentlich" user1 < testmessage

That one worked (no error, and message got delivered into the expected folder), and I am quite sure that we can use nearly every character which is defined in Unicode that way. The next logical idea was to try to represent the space character in the same fashion:

cyrdeliver -a cyrus -m "INBOX/Health&ACA-Reports" user1 < testmessage

That one did not work, which was expected somehow; I've tried it just as a last resort (in IMAP-UTF-7, printable ASCII characters must represent themselves as far as I have understood).

End of excursus

In contrast to lmtpd, imapd works correctly. We can create and fully use all sorts of subfolders with space characters, Umlauts and any other non-ASCII character without any issue with the usual MUAs (tested Outlook and Thunderbird). The bugs are just in the lmtp code.

Could you please revert that breaking change in lmtpd.c and lmtpengine.c to restore the behavior as it was in 2.4.17? Not being able to use mailbox names with space characters (and eventually a few other ASCII characters) drastically reduces the number of scenarios where Cyrus-imapd can be used, because delivering messages via cyrdeliver and lmtp to arbitrary mailboxes and subfolders is a key part of its functionality (at least, for us).

Thank you very much again for your time and effort!

Best regards,

Binarus

elliefm commented 2 years ago

I hadn't realised there was a difference in behaviour between delivering to a userid with mailbox (whether by -m or by plus addressing) vs delivering to a full mailbox name with the userid encoded within it. That's an interesting piece of information, thanks. I don't yet know what it means.

It's extremely unlikely we'll revert the change from 2.4->3.0, if for no other reason than 3.0 itself is ancient. Also I imagine those changes had some other purpose, and something else would break instead if we just ripped them out. If there is a bug or a misbehaviour to be fixed, we'll fix it forward. And if that fix can be easily backported to older releases it might be (but no promises).

I agree that having to turn on fuzzy matching to make this (kinda) work is horrible. Actually, I consider fuzzy matching pretty horrible in general, and for that matter I also consider the behaviour that plus addressing automatically delivers to a named folder pretty horrible too. (If it were my account, I would want to use plus addresses only as hooks for sieve handling, with no behaviour unless I scripted it.)

For the sake of full disclosure, I've been looking at this issue as an interesting and not-unproductive distraction from a boring/stressful thing I'm supposed to be doing, which is why my attention has been a bit limited and patchy.

There's clearly a problem here -- whether it's an implementation problem, an interface problem, a usage/documentation problem, or something else, I'm not sure -- and I would like to get it fixed. It might be a little while before I can fully focus on it though, sorry! It's a bit frustrating, cause it's rare to get a bug report with two different reporters with slightly different presentation who are both able to help with diagnostics.

Though I'm suddenly curious what the use case for delivering to a mailbox containing spaces using deliver is, and whether the same effect can be achieved by some other means. For example, maybe you could add a custom header to the message, and then use sieve to move messages with that header to the folder -- this would bypass the problem with deliver entirely. That's not a fix, but maybe it saves you having to wait for one.

jikamens commented 2 years ago

Though I'm suddenly curious what the use case for delivering to a mailbox containing spaces using deliver is, and whether the same effect can be achieved by some other means. For example, maybe you could add a custom header to the message, and then use sieve to move messages with that header to the folder -- this would bypass the problem with deliver entirely. That's not a fix, but maybe it saves you having to wait for one.

  1. I use procmail instead of sieve. I've been using procmail for literally 30+ years, basically since it was first released. I frankly am an old dog and do not like learning new tricks, so I do not want to have to learn the sieve language. There are various rules in my .procmailrc which cause messages to be delivered into folders other than my Inbox; it should be possible for spaces to be in the names of these folders.

  2. I use a milter which outgoing messages go through. It does some stuff to them which isn't important for the purposes of this discussion and then stores them in my Sent Items folder. I.e., it's this milter storing the messages in Sent Items, not my mail client. The milter uses deliver to put the messages into Sent Items.

But really, I don't think the use case is particularly important here; it's simply not reasonable that deliver can't deliver messages to folders with spaces in their names.

Binarus commented 2 years ago

It's extremely unlikely we'll revert the change from 2.4->3.0, if for no other reason than 3.0 itself is ancient. Also I imagine those changes had some other purpose, and something else would break instead if we just ripped them out. If there is a bug or a misbehavior to be fixed, we'll fix it forward.

I can totally understand that policy. You are right that fixing issues in ancient versions would be a waste of time and resources. I would be very glad to have it fixed in the current version. Plus, even if you would fix it in 3.0, my favorite distribution (Debian) surely wouldn't pick it up. So I'd have to compile myself anyway, and then it wouldn't make a difference whether I compile a fixed 3.0.x version or the current version.

Actually, I currently don't know what the current version is and which version Debian bullseye provides, but for sure I'll look it up shortly. Probably Debian bullseye also provides a version which is already ancient ... Sigh.

Though I'm suddenly curious what the use case for delivering to a mailbox containing spaces using deliver is, and whether the same effect can be achieved by some other means.

This is a valid question, because you need to know whether a fix would be worth the effort; that is, whether a fix would help a non-negligible number of people. Therefore, I'll give some background and explain from scratch.

There is a vast number of (usually smaller) companies (and also more private people than I originally assumed, but let's focus on the small companies here) who want to keep their email messages in-house. There are various reasons to do that, among them privacy concerns, the wish of being independent, fast internal networks combined with slow internet connection, and so on.

The natural way to keep email messages in-house, to share them with colleagues, and to be able to filter and process them in a decent MUA is to put them onto an IMAP server. Even smaller companies usually are able to set up an IMAP server with a little bit of help, and we usually recommend Cyrus imapd to them, because it has some features which other IMAP server software does not provide, and is very reliable, stable and fast.

The most important question is now: How can we stuff incoming email messages into the IMAP server?

In this respect, the situation drastically differs between big organizations (like government, enterprises and universities) and small companies. The former usually have fixed IP addresses, run internal SMTP servers and have administrator teams who manage the SMTP servers; in such environments, the internal SMTP servers are MXs for the respective domains and directly receive messages and deliver them to IMAP (or whatever) servers.

In contrast, small companies don't have fixed IP addresses (at least, this the case in great parts of Europe), and they usually don't have the knowledge and the resources to operate internal SMTP servers (operating your own SMTP server which is directly connected to the internet is much more difficult and dangerous than operating an internal IMAP server and requires you to continuously keep up with security mailing lists). So they must find another way to get incoming messages into their IMAP server.

The solution to this problem is fetchmail (practically always). Those companies usually have a webhosting contract with one of the bigger providers (in Germany e.g. Ionos, Strato, Deutsche Telekom and so on) which includes a large (or even unlimited) number of email addresses. That is, the MX for their domain is run by their webhosting provider, and incoming messages are first stored in mailboxes which are located on the provider's machines. fetchmail then fetches those messages and stores them on the internal IMAP server. The fetching interval is usually one minute, and fetchmail is normally configured to delete the messages from the provider's machines when they have been successfully delivered to the internal IMAP server; this prevents the same message from being fetched multiple times.

So far, so good. But how do we get the messages from fetchmail into the IMAP server? The typical (and natural) mechanism of course is cyrdeliver (if Cyrus imapd is the IMAP server). A typical stanza from a fetchmail configuration file:

poll
secure-pop.t-online.de
proto pop3
user "user1@t-online.de"
ssl
pass "supersecret"
is "user1" here
no rewrite
mda "/usr/sbin/cyrdeliver -f %F -q user1"

The last line tells fetchmail to pipe the messages to cyrdeliver which then delivers them to the INBOX of user1. This pattern is widely deployed, and we can easily imagine that we eventually don't want to deliver to INBOX, but to subfolders with arbitrary names. For example, there might be a second email address (with a separate mailbox) at the provider side, but the messages to the second address should not go to a separate mailbox in the internal IMAP server; instead, they should go to a subfolder in the same mailbox as above. In this case, we would just create a second stanza in the fetchmail configuration like the following (only the changed lines are shown):

user "user2@t-online.de"
is "user2" here
mda "/usr/sbin/cyrdeliver -f %F -q -m 'INBOX/Health Reports' user1"

This brings us back to the problem which is discussed here (space characters in folder names). I haven't seen such configurations too often, though. But the real power of fetchmail delivery (and the one which is heavily used in practice) stems from the fact that we can construct nearly arbitrary filters and pipes that way. The most important use case is filtering by procmail. For that purpose, we need to change the fetchmail configuration only slightly to use procmail as our MDA. The last line then would be (for example):

mda "/usr/bin/procmail /etc/procmailrc"

This would make fetchmail pipe the messages to procmail, which processes them according to the rules which are defined in /etc/procmailrc (in this case). Oversimplified, in procmailrc, there can be many rules, each rule consisting of conditions and actions; if the conditions are met, the action is carried out. The action lines can trigger predefined behavior (such as delivering the message to an mbox in the respective user's home directory), but more interesting are action lines which are pipes themselves. An example procmailrc (stripped down to what is interesting in this context):

:0 w
* ^From:\ system@server01.example.org$
* ^Subject:\ Server01: ZFS replication: VM01: Success$
| /usr/sbin/cyrdeliver -a cyrus -q -m "INBOX/Health Reports" user1

Please don't allow the first line and the syntax to distract you. Let's forget about the first line. The second and third lines just say that the action line should be carried out if the message on the standard input comes from the indicated email address and if its subject equals the indicated subject. The most interesting line is the action line, which is the last one. It is a pipe itself, so it makes procmail pipe the message to cyrdeliver, so that cyrdeliver delivers the message to the indicated subfolder (currently not, due to the bug being discussed here, but in theory).

This mechanism is extremely powerful, easy to understand and arbitrarily extensible. I could construct a pipe with 20 stations if I was in the mood to do so, each station altering the message in another way, cyrdeliver always being the last station. I am convinced that there are several millions of installations which handle filtering and presorting that way.

Last, but not least: We shouldn't forget that Microsoft Outlook, which is undoubtedly one of the most widely used MUAs, already produces problems by itself. Nearly all MUAs in standard configuration, when adding a new account to the IMAP server, create a mailbox / folder on the IMAP server where they copy the sent messages to. Nearly all MUAs call that mailbox / folder Sent (no problem here), while the folks at Microsoft thought it would be a great idea to use a name with a space character in it: Sent Items.

While I personally didn't need cyrdeliver to deliver messages to my Sent folder so far, @jikamens has described a use case where the problem bites him. Everything is possible, and it should be noted that one of the most prevalent MUAs automatically creates an IMAP folder which is vital for its function and which contains space characters in its name. Probably you can prevent Outlook from doing so, but I know nobody who took the effort.

Regarding sieve:

There are people who have accumulated all sorts of highly complex procmail recipes during the last 20+ years, since procmail originally was the only reasonable way to filter messages (at the times where sendmail was the standard SMTP server). It would be a lot of work to rewrite those recipes in sieve; actually, I doubt that it would be possible at all.

Plus, I personally found sieve particularly hard to understand. I needed a while just to see how to make the server execute filters.

Third, sieve has an architectural drawback IMHO. I am really not specialized in this field, and my knowledge is very limited, but having to run a separate server software just to be able to upload filter recipes to an IMAP server does not sound right to me. I'd like to keep the attack surface and resource consumption as small as possible, and I therefore do not want to start another server program just for that purpose (admitting that I can't say anything about sieve's resource consumption, because I never have tested it thoroughly).

The whole sieve system is very complicated in my personal opinion. Although I'm admiring the realization at the technical level, and although I really can't say anything bad about it, I personally don't get along with it, and I believe that filtering by the methods shown above will always be easier as well as more flexible; I even could include sed, awk and other weird tools in my filter pipes.

Regarding other solutions:

Obviously, there is an amazing number of people who are desperate enough to use Postfix in conjunction with fetchmail to filter and deliver messages to Cyrus imapd. That is, they use fetchmail to fetch the messages from somewhere, then let fetchmail pump the messages into a local Postfix server, and let that Postfix server deliver the messages to Cyrus imapd (probably via lmtp). There is a myriad of tutorials out there which outline the necessary steps.

I never have understood what makes people do that. IMHO, it is totally weird to run a fully-blown SMTP server, using it for nothing else than local delivery of messages to an IMAP server, where the messages already have arrived on the same machine. The only reason which comes to my mind: Perhaps they have already tried the way I have outlined above and have run into the problem which is discussed here, or similar problems, and perhaps they could solve it by using another lmtp software, for example the one which comes with Postfix. In other words, perhaps Postfix can deliver messages to Cyrus imapd to a mailbox or subfolder with space characters in their name. After all, the bug which causes the problem is in Cyrus' lmtpd.c and lmtpengine.c (as far as I can tell); Postfix of course uses its own programs for delivery which might not suffer from that bug. But this is just wild speculation.

Summary

We really would like this problem to be solved, because we need cyrdeliver as the last stage in a pipe which receives, filters and otherwise processes incoming messages; in such context, cyrdeliver must be able to deliver the messages into mailboxes and folders with arbitrary names, including names which have space characters and Unicode characters in them (the latter seems to work, though, with code points > 127).

Eventually we could use other mdas (instead of cyrdeliver) which can deliver messages via lmtp. To be honest, I didn't try yet, mainly because I did not have the time to research whether appropriate programs exist (appropriate here means: programs which are not part of a fully blown SMTP stack and which don't have a zillion dependencies in the package manager of my favorite distribution). Furthermore, I really believe that having Cyrus' lmtp fixed would be by far better than tinkering around with third-party programs.

Binarus commented 2 years ago

@elliefm

Since there was no activity since my last post: Could you please tell us whether you (or your team) have decided to work on this issue, and if not, when this decision will be made?

Everybody will understand if you have other priorities and say "We won't fix" (after all, in that case, there are other IMAP servers out there).

It is clear that a bug fix will take its time. But we hope that it is at least possible for you to decide quickly whether you intend to fix this issue at all, and we were very grateful if you would clearly tell us your decision and wouldn't let us wait several weeks for that decision. We have to act upon the situation over here within the next few days. That means that within the next days we'll have to decide if we switch to a different IMAP server software or if we wait for your fix. Of course, we can only decide that question when we know if you are planing a fix.

Once again, having to wait for a fix wouldn't be a show stopper for us (unless the waiting time would be longer than a few months), but we would like know soon whether you'll fix the bug at all.

Thank you very much, and best regards,

Binarus

P.S. I've just found a further bug in deliver which is probably related. I'll add the cross reference here once I have filed it.

Binarus commented 2 years ago

I now have filed the other issue: #3903

Although the issue described above is probably closely related to #3903, and although a bugfix for one would also mitigate the other, I'd like to keep them separated, because the issue described above is somehow urgent, while #3903 can be easily worked around.

Thank you very much, and best regards,

Binarus

Binarus commented 2 years ago

I believe that I have narrowed down the bug in the meantime:

On a system where Cyrus imapd and the accompanying programs in version 3.0.8 are installed, I put cyrdeliver in version 2.5.10 and the libraries it depends on in an isolated directory and executed it. Trying to let it deliver to a mailbox or subfolder with space characters in its name failed the same way as described above: "Bad protocol" error, no message delivery.

Then I repeated the test with cyrdeliver 2.4.17 and the libraries it depends on. The outcome was exactly the same.

That means that it is very likely that lmtpd is the problem. The following facts lead to this conclusion:

That leaves only lmtpd which can cause the problem.

Perhaps I'll also try the old lmtpd in version 2.4.17 and report back.

Regards,

Binarus

Binarus commented 2 years ago

Perhaps I'll also try the old lmtpd in version 2.4.17 and report back.

Well, I couldn't make lmtpd from the 2.4.17 distribution work with imapd version 3.0.8. I won't experiment further in this field because it could become a bit dangerous. I am afraid that old lmtpd versions could crash newer imapd versions' databases.

Regards,

Binarus

Binarus commented 2 years ago

@elliefm Another week has passed without any statement from your side. Could you please tell us whether this is a "Won't fix" or if somebody will work on it (some day)?

Regards,

Binarus

jikamens commented 2 years ago

@Binarus please chill. This is an open-source project. Nobody owes you anything here, and you're being kind of pushy. If this has been broken since 3.0 and you

jikamens commented 2 years ago

... and I are the only people to complain about it, then the developers are completely justified in not prioritizing it as a critical issue.

Binarus commented 2 years ago

@jikamens Thanks for your comment. I have of course understood that this is an open source project and that nobody owes us anything. I am also not insolent enough to try to make somebody provide a fix quickly.

Perhaps I haven't made myself clear enough, so let's try again: I didn't yell "We need that fix ASAP". I even didn't ask when the fix will be provided. Instead, I just asked whether at all the developers will spend time on trying to provide a fix, and I still believe that it is not the best style to leave this simple question unanswered for such a long time.

It can't be that hard to decide whether this is a "Won't fix" or a "Will fix". If it is a "Won't fix", this is perfectly acceptable of course, and although we would need to switch over to something else then, we still would be very grateful for the great work Cyrus imapd is, and for what it gave us in the past. If it is a "Will fix in some time", this is even better, and we'll wait for the fix then, sticking with Cyrus imapd, and explaining people that they just need to be patient until the fix is published.

But without that decision, the situation is bad. Switching over to something else means non-negligible costs for organizations and removes some features which are unique to Cyrus, and if the Cyrus team provides a fix later, the man power and money will have been wasted, which would be quite stupid. On the other hand, waiting for a fix which never will be provided would also be stupid and would upset users.

In that sense:

... and I are the only people to complain about it, then the developers are completely justified in not prioritizing it as a critical issue.

You are right. But that doesn't mean that they shouldn't tell us about their decision to not fix it.

[ Aside: I doubt that you and me are the only people affected by this (see one of my previous posts); I guess that this issue is monitored by a lot of people. I am not that deep in Github that I could know how to get the stats, though ... ]

Best regards,

Binarus