Open sanmai-NL opened 6 years ago
I'm happy to provide some mentoring. The best place to start is probably to just look at https://github.com/djc/imap-proto/commit/5b9dc9b9d020e5fdd01f064b8f4f2f408d8de3b7 and copy that approach for the needs of BODYSTRUCTURE
; then ask me questions if anything is unclear.
I could maybe write some documentation, but as I already stated in https://github.com/djc/tokio-imap/issues/2 it would help if you could state more concretely what you're looking for. In my mind the parser code in imap-proto
is just applying nom
macros to the formal syntax from the RFCs ideally as directly as possible, which is pretty straightforward.
I’m on it. 🙂
An remotely related question to cement my understanding. In RFC 3501 FLAGS fetch-att
production. This seems to be incongruous with the parser code in src/parser.rs#L452-L456, that seems to imply a list follows when "FLAGS" is used as fetch item. Could you explain?
Not sure I fully understand your question, but in general there's no parsing code for commands, only for server responses. Does that explain what you are seeing?
Yes, of course. Sorry.
Your msg_att_list
appears to be the equivalent of msg-att
, correct?
I’m interested to learn, how did you come to the decision to not follow the ABNF ‘strictly’? Are you, in principle, okay with rewriting your code to match the ABNF strictly, e.g. distinguishing msg-att-dynamic
and msg-att-static
?
In the msg-att-static
production, this alternative
`"BODY" ["STRUCTURE"] SP body /
covers BODYSTRUCTURE
responses. This implies that the response to both a BODY
and a BODYSTRUCTURE
FETCH
command can be handled by a single combinator.
Somewhat puzzled, I searched online a bit, and found this analysis. I conclude that some responses are in the IMAP formal language, but are invalid per the protocol description. For example, RFC 3501 7.4.2:
Extension data is never returned with the BODY fetch, but can be returned with a BODYSTRUCTURE fetch.
Regrettably, here the authors are using the language ‘is never’ rather than clearer requirement key words (RFC 8174). Anyway, it seems as if the parser, if implemented based on the grammar, would accept responses that make no sense from an implementation standpoint. Could you weigh in on this?
I propose to modify the existing msg_att_body_section
combinator into a msg_att_body_or_bodystructure
.
I’ve opened a WIP PR #16 for you to look at for a context to my comment, @djc. Let’s continue discussion there.
In writing the parser, there are two concerns that might somewhat compete. One is to make the parser easy to read and follow in code. The other is to make the parser resemble the formal syntax in the RFC, so that it's easy to match entry points to the standard. If you have proposals to rename some parsers to make them easier to match to the parser, I'm probably okay with that. On the other hand, combining or separating parsers just because it's in the formal syntax, I might not be okay with if I feel it makes the parser harder to follow. I'd be fine with adding comments to point out the disparity, though!
For this case, to me the msg-att-static
and msg-att-dynamic
separation doesn't make sense in the context of the parser, even if it might make sense in the context of understanding the protocol somehow. I feel like separating these would make the parser harder to follow. Feel free to prove me wrong with a PR, but I'm probably going to be disinclined to merge it.
If the parser would accept responses that make no sense from an implementation standpoint, I think my guiding principle would be what implementation is the least complex. The primary goal is for the parser to accept all commonly used IMAP vocabulary. The secondary goal is to minimize complexity.
Modifying msg_att_body_section
into msg_att_body_or_bodystructure
sounds okay to me.
See RFC 3501, 6.4.5,
BODYSTRUCTURE
. Also,MIME-IMB
/MIME document series in RFC 2045.For a motivation, see RFC 2683 3.2.1.4.
It would be great if some mentoring is provided over this (or ideally documentation).