IMAP parser does not allow invalid UTF-8 in literal strings

link2xt commented 4 years ago

For example:

diff --git a/imap-proto/src/parser/rfc3501.rs b/imap-proto/src/parser/rfc3501.rs
index 56e67d0..a81f7fa 100644
--- a/imap-proto/src/parser/rfc3501.rs
+++ b/imap-proto/src/parser/rfc3501.rs
@@ -694,7 +694,7 @@ mod tests {

     #[test]
     fn test_addresses() {
-        match crate::parser::rfc3501::address(b"(\"John Klensin\" NIL \"KLENSIN\" \"MIT.EDU\") ") {
+        match crate::parser::rfc3501::address(b"(\"John \x88Klensin\" NIL \"KLENSIN\" \"MIT.EDU\") ") {
             Ok((_, _address)) => {},
             rsp @ _ => panic!("unexpected response {:?}", rsp)
         }

It makes cargo test test_addresses fail.

nstring_utf8 parser calls string_utf8 which calls str::from_utf8. If it fails, the whole parser fails.

Using "lossy" variants would not allow the parser to return str. One way is to return binary slices and let the user cleanup the input. Another is to return String.

I think switching from str to byte slices is preferred, as the user of the library may be interested in actual bytes, even if they don't form valid UTF-8 string.

djc commented 4 years ago

As far as I can tell, this is invalid per RFC 3501:

address         = "(" addr-name SP addr-adl SP addr-mailbox SP
                  addr-host ")"
addr-name       = nstring
                    ; If non-NIL, holds phrase from [RFC-2822]
                    ; mailbox after removing [RFC-2822] quoting
nstring         = string / nil
string          = quoted / literal
quoted          = DQUOTE *QUOTED-CHAR DQUOTE
QUOTED-CHAR     = <any TEXT-CHAR except quoted-specials> /
                  "\" quoted-specials
quoted-specials = DQUOTE / "\"
TEXT-CHAR       = <any CHAR except CR and LF>

From RFC 2234, section 6.1 (Core rules):

CHAR           =  %x01-7F
                               ; any 7-bit US-ASCII character,
                                  excluding NUL

What server are you talking to here?

link2xt commented 4 years ago

@djc You are right, I oversimplified the testcase.

The problem was caused by literal string. Here is a better example:

diff --git a/imap-proto/src/parser/rfc3501.rs b/imap-proto/src/parser/rfc3501.rs
index 56e67d0..aff6a37 100644
--- a/imap-proto/src/parser/rfc3501.rs
+++ b/imap-proto/src/parser/rfc3501.rs
@@ -694,7 +694,7 @@ mod tests {

     #[test]
     fn test_addresses() {
-        match crate::parser::rfc3501::address(b"(\"John Klensin\" NIL \"KLENSIN\" \"MIT.EDU\") ") {
+        match crate::parser::rfc3501::address(b"({4}\r\nJo\x88n NIL \"KLENSIN\" \"MIT.EDU\") ") {
             Ok((_, _address)) => {},
             rsp @ _ => panic!("unexpected response {:?}", rsp)
         }

According to RFC 3501, literals contain CHAR8, which have extended range: %x01-ff.

hpk42 commented 4 years ago

The parsing error happens eg with a click-hoster's imap providing this welcome message: * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE XLIST AUTH=PLAIN AUTH=LOGIN] kasserver.com mailserver ready.. I am guessing it's a dovecot install that this mass-provider is running.

djc / tokio-imap

IMAP parser does not allow invalid UTF-8 in literal strings #62