djc / imap-proto

IMAP protocol parser and datastructures in Rust
12 stars 2 forks source link

Parse `FETCH ... BODYSTRUCTURE` response #15

Open sanmai-NL opened 6 years ago

sanmai-NL commented 6 years ago

See RFC 3501, 6.4.5, BODYSTRUCTURE. Also, MIME-IMB/MIME document series in RFC 2045.

For a motivation, see RFC 2683 3.2.1.4.

It would be great if some mentoring is provided over this (or ideally documentation).

djc commented 6 years ago

I'm happy to provide some mentoring. The best place to start is probably to just look at https://github.com/djc/imap-proto/commit/5b9dc9b9d020e5fdd01f064b8f4f2f408d8de3b7 and copy that approach for the needs of BODYSTRUCTURE; then ask me questions if anything is unclear.

I could maybe write some documentation, but as I already stated in https://github.com/djc/tokio-imap/issues/2 it would help if you could state more concretely what you're looking for. In my mind the parser code in imap-proto is just applying nom macros to the formal syntax from the RFCs ideally as directly as possible, which is pretty straightforward.

sanmai-NL commented 6 years ago

I’m on it. 🙂

An remotely related question to cement my understanding. In RFC 3501 FLAGS (store command data item) takes a list, unlike FLAGS (fetch item), looking at the fetch-att production. This seems to be incongruous with the parser code in src/parser.rs#L452-L456, that seems to imply a list follows when "FLAGS" is used as fetch item. Could you explain?

djc commented 6 years ago

Not sure I fully understand your question, but in general there's no parsing code for commands, only for server responses. Does that explain what you are seeing?

sanmai-NL commented 6 years ago

Yes, of course. Sorry.

sanmai-NL commented 6 years ago

Your msg_att_list appears to be the equivalent of msg-att, correct?

I’m interested to learn, how did you come to the decision to not follow the ABNF ‘strictly’? Are you, in principle, okay with rewriting your code to match the ABNF strictly, e.g. distinguishing msg-att-dynamic and msg-att-static?

In the msg-att-static production, this alternative

`"BODY" ["STRUCTURE"] SP body /

covers BODYSTRUCTURE responses. This implies that the response to both a BODY and a BODYSTRUCTURE FETCH command can be handled by a single combinator.

Somewhat puzzled, I searched online a bit, and found this analysis. I conclude that some responses are in the IMAP formal language, but are invalid per the protocol description. For example, RFC 3501 7.4.2:

Extension data is never returned with the BODY fetch, but can be returned with a BODYSTRUCTURE fetch.

Regrettably, here the authors are using the language ‘is never’ rather than clearer requirement key words (RFC 8174). Anyway, it seems as if the parser, if implemented based on the grammar, would accept responses that make no sense from an implementation standpoint. Could you weigh in on this?

I propose to modify the existing msg_att_body_section combinator into a msg_att_body_or_bodystructure.

I’ve opened a WIP PR #16 for you to look at for a context to my comment, @djc. Let’s continue discussion there.

djc commented 6 years ago

In writing the parser, there are two concerns that might somewhat compete. One is to make the parser easy to read and follow in code. The other is to make the parser resemble the formal syntax in the RFC, so that it's easy to match entry points to the standard. If you have proposals to rename some parsers to make them easier to match to the parser, I'm probably okay with that. On the other hand, combining or separating parsers just because it's in the formal syntax, I might not be okay with if I feel it makes the parser harder to follow. I'd be fine with adding comments to point out the disparity, though!

For this case, to me the msg-att-static and msg-att-dynamic separation doesn't make sense in the context of the parser, even if it might make sense in the context of understanding the protocol somehow. I feel like separating these would make the parser harder to follow. Feel free to prove me wrong with a PR, but I'm probably going to be disinclined to merge it.

If the parser would accept responses that make no sense from an implementation standpoint, I think my guiding principle would be what implementation is the least complex. The primary goal is for the parser to accept all commonly used IMAP vocabulary. The secondary goal is to minimize complexity.

Modifying msg_att_body_section into msg_att_body_or_bodystructure sounds okay to me.