duesee / imap-codec

Rock-solid and complete codec for IMAP
Apache License 2.0
35 stars 13 forks source link

feat: Implement `METADATA` #432

Closed link2xt closed 4 months ago

link2xt commented 5 months ago

There is currently no support for METADATA extension.

Encoding SETMETADATA command and decoding METADATA responses returned for GETMETADATA commands does not seem to be possible currently.

duesee commented 5 months ago

SORT/THREAD is next. Then CONDSTORE/QRESYNC.

METADATA doesn't look too complicated. But the devil may be in the detail. Are you in a position to draft something up? :-)

If not, I can take a look in a few weeks.

duesee commented 5 months ago
capability        =/ "METADATA" / "METADATA-SERVER"
command-auth      =/ setmetadata / getmetadata
; empty string for mailbox implies server annotation.
setmetadata       = "SETMETADATA" SP mailbox SP entry-values

entry-values      = "(" entry-value *(SP entry-value) ")"

entry-value       = entry SP value

; Slash-separated path to entry. MUST NOT contain "*" or "%".
; TODO(damian): astring allows "*" and "%". We could use a `Wrapper(AString)`?
entry             = astring

value             = nstring / literal8
; empty string for mailbox implies server annotation.
getmetadata       = "GETMETADATA" [SP getmetadata-options] SP mailbox SP entries

getmetadata-options = "(" getmetadata-option *(SP getmetadata-option) ")"

; tagged-ext-label and tagged-ext-val are defined in [RFC4466].
; TODO(damian):
; See discussion in https://github.com/duesee/imap-codec/pull/417
; TLDR;
; I think we don't want to support a "generic" syntax. Only implement `maxsize-opt` and `scope-opt` from below?
getmetadata-option = tagged-ext-label [SP tagged-ext-val]

; Used as a getmetadata-option
maxsize-opt       = "MAXSIZE" SP number

; Used as a getmetadata-option
scope-opt         =  "DEPTH" SP ("0" / "1" / "infinity")

entries           = entry / "(" entry *(SP entry) ")"
response-payload  =/ metadata-resp

; empty string for mailbox implies server annotation.
metadata-resp     = "METADATA" SP mailbox SP (entry-values / entry-list)

entry-list        = entry *(SP entry)
resp-text-code =/ "METADATA" SP (
                    "LONGENTRIES" SP number / 
                    "MAXSIZE" SP number /
                    "TOOMANY" /
                    "NOPRIVATE"
                  )

Quirks

link2xt commented 5 months ago

I am currently fixing metadata parsing in imap-proto: https://github.com/djc/tokio-imap/pull/159 But long-term considering switching to imap-codec (mainly because imap-proto fails to parse the rest of the stream if slightly broken response is received so introducing usage of new commands is very dangerous, hope that imap-codec handles it better) so this would be a blocker then.

Offtopic: is it true that imap-codec tolerates garbage responses, e.g. received line * GARBAGE (FOO "bar") will be skipped? imap-proto gets stuck returning infinite stream of errors and the only way is to reset connection, but then after reconnecting and repeating the same commands we get into the same state. Currently the best workaround with imap-proto is to ratelimit reconnections to at least avoid 100% CPU usage loop.

duesee commented 5 months ago

imap-codec doesn't do anything with connections. It only cares about parsing an (possibly incomplete) bytes input. Connections are handled by imap-flow. Soon, we'll have a higher-level library based on imap-flow that will be similar to async-imap (but based on a different operational model, libraries, etc.) Spoiler: IDLE will work better, it's cancellation-safe, you can use select!, etc.

Short answer: no.

I'm not sure how tolerating garbage in IMAP could work. IMAP doesn't really have "tokens" we can detect. Simon, the go-imap developer tried to tokenize the IMAP stream in v1 and gave up in v2. (I hope I paraphrase him correctly.)

The best one could do is to detect lines and literals. But even then, this is asking for issues.

Example:

* GARBAGE (FOOO) {100}\r\n
Have I told you how cool ...
* 1 EXISTS
* OK [UIDVALIDITY 1337] ...
... is?

If we discard the first line, the next lines could be interpreted as server data. It's not predictable. I would prefer to only discard/ignore data when we are absolutely sure we know where to safely chime in again in the byte stream.

Do you have a real-world example of a server sending garbage? I don't mean unknown codes or something that can be detected. But things similar to your GARBAGE response.

link2xt commented 5 months ago

Do you have a real-world example of a server sending garbage?

Most recent example is mail.163.com sends * STATUS "INBOX" () response which is explicitly prohibited by grammar which says status-att-list is non-empty. If you build grammar according to the standard, you will fail to parse it as a STATUS response. In this case imap-proto returned parsing error, but I would prefer this line to be skipped and pretend that server simply responded with A100 OK without untagged response. This would be a server error for STATUS command, but at least I would not have to reset the connection and can fallback to run other commands or assume the worst (e.g. if I asked for UIDNEXT I can assume that it changed if I cannot get it via STATUS).

Another example was outlook server adding whitespace at the end of the line.

duesee commented 5 months ago

Thoughts:

Most recent example is mail.163.com sends * STATUS "INBOX" () response which is explicitly prohibited by grammar which says status-att-list is non-empty.

That's unfortunate. The workaround is non-trivial because it would either require allowing an empty list (which would spread the incompatibility) or dropping the message (which would require additional logic).

Was this already reported to mail.163.com? Can't they just stop sending this?

Another example was outlook server adding whitespace at the end of the line.

Was this already reported to Microsoft? Workaround is easy and can be done w/o the risk to reproduce the fault (or changing semantics).

Mental note: It's the third time I see an open-source project implementing a workaround for exactly this issue, which, basically, should require Microsoft deleting a single character.

duesee commented 4 months ago

Hey @link2xt! Do you have an example how you use the metadata extension? I'm having a bit of a hard time figuring out how to present the entries to users.

Currently, it's just an astring. But it could be a PathBuf, Vec<Thing>, enum Entry { Private(...), ... etc.

The RFC is not totally clear to me. Any suggestions or experience?

link2xt commented 4 months ago

We currently only get /shared/admin and /shared/comment. Generally string as a key is fine, I can separate it by / if needed.

duesee commented 4 months ago

Small update @link2xt: I talked with Arnt Gulbrandsen and asked him about ignoring/dropping bad messages. In practice, it may be safe to consume a line, see if it ends with {...}, if so, consume a literal, consume a line, see if it ends with {...}, if so, ... w/o doing IMAP parsing at all.

Funnily, I used this (seemingly hacky) approach already when implementing a server¹. But I changed it due to the danger of ambiguities (and possible injection attacks). There are cases where {...} at the end of a line doesn't necessarily mean "literal", e.g., when { and } is used in a Text. But the insight is that these cases can be reliably detected. It seems this is one of the many "undocumented truths" about IMAP and extensions stick to it.

TLDR;

It's possible to realiably split an IMAP stream into fragments w/o doing IMAP parsing. Thus, imap-codec could become more forgiving about garbage while still ensuring that garbage doesn't interfer with good messages.

¹ https://github.com/Email-Analysis-Toolkit/fake-mail-server