djc / tokio-imap

Tokio-based IMAP implementation
Apache License 2.0
123 stars 42 forks source link

Implementing a more generic interface #139

Open estriv opened 2 years ago

estriv commented 2 years ago

Introduction

While the crate does a great job communicating with standard IMAP servers it lacks the serenity to do so with unknown extensions or servers with unknown responses.

The following ideas should start a discussion on how to improve the existing code or refactor it in order to make this crate better.

As an example of what I mean by that the response of a1 FETCH 1 X-GM-LABELS which is e.g. * 1 FETCH (X-GM-LABELS ("\\Important")) has a well known structure being of kind msg_att%22%20/%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%22INTERNALDATE) but fails because X-GM-LABELS is not known to the parser.

Resolving strictness

In my opinion the problem is rooted in the strict implementation of the rfc standard and the unforgiving parsing which fails a request even if the pattern of the response is known. There should be at least a fallback which enables users of the crate to do their own parsing and this should be expected behavior and not error handling.

Commands

For commands the story seems pretty straight forward:

struct Command {
  pub name: String,
  pub args: Vec<String>
}

This can then be built using a convenience CommandBuilder that can be extended by custom traits and be used by implementing Display for Command.

I do not see a reason for not using &str or String for communication with an imap server and so this should be the basis I guess.

Responses

A response has more or less the following structure:

<tag> [OK|NO|BAD] <cmd> <details>

and can be parsed into some struct like this:

struct Response {
  tag: String,
  result: Option<ImapResult>,
  cmd: Keyword,
  details: ...
}

with the following additional types:

enum Keyword {
  Capability,
  Select,
  Examine,
  Flags,
  List,
  Search,
  // ...
  Other(String),
}

enum ImapResult {
  Ok, No, Bad
}

The type of details is omitted because this is the tricky part and there are several possibilities how to handle this:

  1. Vec<Value>
  2. Recursive Value structure
  3. ValueStore using Any, downcast_ref and structs instead of enums for every value type

A value should be represented by an enum (or it's own struct) like this, which encodes all possible data structures like quoted strings, sequences, lists and so on:

enum Value {
  ParenList(Vec<String>),
  MsgAtt(Vec<Attribute>),
  Sequences(Vec<u64>),
  Status(String, Option<String>),
  Quoted(String),
  //...
}

pub struct Attribute {
  pub name: String,
  pub value: Value,
}

Conclusion

Implementing the protocol in a way described would enable the parser to read unknown responses into generic structures which can be handled consistently to the standard responses without introducing a second way to use the crate. So the response in the introduction would have been read into something like:

Response {
  tag: "*",
  cmd: Keyword::Fetch,
  result: None,
  details: Value::MsgAtt(vec![
    Attribute {
      name: "X-GM-LABELS",
      value: Value::ParenList(vec!["\\Important"])
    },
  ])
}

Handling known and unknown extensions should "feel" the same and the data structures I checked that far should be ok to handle some kind of trial and error parsing. This should be also ok in terms of performance as the bottleneck of imap is not the definitely not the execution performance but the network performance.

I did not cover the body part. This is, at least for the beginning, just a big block of characters which holds envelope, message and attachments and could be handled by an external crate. Of course the receiving will be problematic but I do not consider this as a too hard problem.

Most of the core parsing mechanisms should be reusable but the handling of commands and responses would change. I am looking forward to comments and ideas.

djc commented 2 years ago

I'm honestly not convinced. This adds a whole bunch of complexity for the purpose of dealing with unknown extensions, and then deals with unknown extensions in an imprecise (and relatively inefficient) way so you don't really know what to expect and you just get to read out and make sense of the parsed data somehow.

Contrast this with the current approach, where once in a while someone finds something that they'd like to be supported, they file an issue and hopefully a PR, I review and merge it and publish a new release with actually well-typed support for the new extension.

estriv commented 2 years ago

Well, the elegance of the current implementation lies in the strict reconstruction of the rfc standard. The complexity is transferred to the user of the crate. For example calling "SEARCH" and then fetching the bunch of messages returned by the search takes a lot of code with some ugly parts (as long as I did not miss some basics):

  let search = tls_client
    .call(Command {
      args: b"FLAGGED".to_vec(),
      next_state: None
    })
    .try_collect::<Vec<_>>()
    .await
    .unwrap();

  let search_res = search.first().unwrap().parsed();
  let seqs = if let Response::MailboxData(imap_proto::MailboxDatum::Search(seqs)) = search_res {
    seqs.clone()
  } else {
    Vec::new()
  };

  let mut seq_iter = seqs.iter();
  let mut cmd = CommandBuilder::fetch().num(seq_iter.next().unwrap().clone());
  for seq in seq_iter {
    cmd = cmd.num(*seq);
  }
  let cmd = cmd
    .attr(Attribute::Envelope)
    .attr(Attribute::Rfc822Size);
  let fetches = tls_client
    .call(cmd)
    .try_collect::<Vec<_>>()
    .await
    .unwrap();

From my point of view more complexity in the implementation of a library or crate is rectified by a good user experience with less complexity. So for example a more convenient way to to this would be (pseudo):

let res = client.call(Command::build().cmd("SEARCH").args(&["FLAGGED"]))?;
let seqs = res.data::<Value::Sequences>().unwrap(); // return value is Option<Value::Sequence> or maybe Vec<Value::Sequence>
let res = client.call(Command::build().cmd("FETCH").args(&["(ENVELOPE RFC822.SIZE)"])).unwrap();
let envelopes = res.filter_map(Response::data::<Value::Body>).collect();

Yes, this could also be achieved by the current implementation through changing the interface. The point of the proposal is not only allowing unknown extensions but being more robust against unknown responses. The crate should fail in a way that I can fix a super special case on my side without having to create a PR request.

Some words to the "imprecise and inefficient" way to deal with unknown data. The whole point is that we suppose that all data transferred from a server follows basic structural rules. For example some basic data types: Blocks, Lists, Sequences, Ranges, Attributes, Strings and Symbols. Every extension should use these data types and every response should be read into a set of these types (or fail into some Value::Unparsed for example and let the user of the crate handle this). The point is that even with your strict implementation of the standard I have to know the standard and access data in a rather complicated way. So agreeing on that a user of the current implementation has to know the structure of a response and has to go down a rather long road to get the wanted data I do not see more inefficiency with a more loose way of handling responses. About the impreciseness I agree as long as there is no way to identify for example messages/sequences or response lines. The question is where is this needed and how can this be achieved through an elegant architecture. I did not think about this in depth but do not see a big problem with coming up with a solution there.

djc commented 2 years ago

A design goal of this library is to be an implementation of the relevant standards. It does not try to abstract over the standards because such an abstraction would be opinionated. The point of this crate is that it tries to be unopinionated and just follow the standard, more or less slavishly. That allows other libraries to build on top of this library and easily understand what is going on "under the hood". If you want to build a higher-level abstraction on top of the data types in this crate, that could be a useful project, but I don't think it belongs in this crate.