frozenlib / parse-display

Procedural macro to implement Display and FromStr using common settings.
Apache License 2.0
182 stars 14 forks source link

Parser cannot handle enum variants whose name contains spaces, within larger structs #9

Open coriolinus opened 3 years ago

coriolinus commented 3 years ago
#[derive(parse_display::FromStr)]
enum Instruction {
    #[display("turn off")]
    TurnOff,
    #[display("toggle")]
    Toggle,
}

//succeeds
#[test]
fn parses_toggle() {
    "toggle".parse::<Instruction>().unwrap();
}

//succeeds
#[test]
fn parses_turn_off() {
    "turn off".parse::<Instruction>().unwrap();
}

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    value: u64,
}

//succeeds
#[test]
fn parses_contains_toggle() {
    "toggle 123".parse::<ContainsInstruction>().unwrap();
}

//fails
#[test]
fn parses_contains_turn_on() {
    "turn off 123".parse::<ContainsInstruction>().unwrap();
}
frozenlib commented 3 years ago

This is by design.

Since the format is specified as #[display("{instruction} {value}")], ContainsInstruction::from_str() works as follows.

Because of this behavior, it will not work properly if the string of fields contains characters that separate the fields.

If there is only one field that may contain a space, it is possible to separate them by the last space by adding #[from_str(regex = ".*")] as follows.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    #[from_str(regex = ".*")]
    instruction: Instruction,
    value: u64,
}

For more complex cases, I think it is better to use a function that converts the beginning of the string to a value, such as a funtion that return IResult in nom crate, instead of a function that converts the entire string to a value, such as FromStr::from_str().

frozenlib commented 3 years ago

Instead of specifying a field that may contain spaces, you can also specify a field that is guaranteed not to contain spaces.

#[derive(parse_display::FromStr)]
#[display("{instruction} {value}")]
struct ContainsInstruction {
    instruction: Instruction,
    #[from_str(regex = r"[^\s]+")]
    value: u64,
}
coriolinus commented 3 years ago

Thank you for the quick response. I'm not familiar with the internals of this library: would it be practical to re-implement ContainsInstruction::parse with the following algorithm?

In other words, construct a and use internally a parser which converts the beginning of a string and returns the unused parts, but present the whole-string parsing interface to the end user.

Obviously that's a pretty big feature request, but it would solve this issue. AFAICT it should be possible for regex-based sub-parsers to work properly under this design as well.

In the meantime, would you be willing to add documentation that this is a known limitation of the library?

frozenlib commented 3 years ago

Certainly, I think the above algorithm would be able to parse the string more accurately.

If I implement it, I would like to use a standard trait rather than the internal API to be able to use this algorithm even if struct contains fields that do not use parse-display. (e.g. std::u8)

a parser which converts the beginning of a string and returns the unused parts

Do you have any idea of existing standard trait that can be used for such purposes?

frozenlib commented 3 years ago

Adding the following features is also an option.

Once Rust's specialization feature is stabilized, it may be able to omit #[from_str(regex = auto)].

This method has the following pros and cons.

Pros

Cons

coriolinus commented 3 years ago

Do you have any idea of existing standard trait that can be used for such purposes?

I don't know of any, unfortunately. AFAIK the general form would be something like

// returns `(parse result, unused portion of input)`
fn parse<'a, T: FromStr>(s: &'a str) -> (Result<T, <T as FromStr>::Err>, &'a str);

[edit] Note that in this function, the FromStr implementation has to greedily capture as many characters as it needs from the start of the string. Very likely, a different trait would suit better there.

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage. This feature, assuming it causes the test cases above to pass, plus appropriate mentions in the documentation, would be sufficient in my mind to close out this issue.

frozenlib commented 3 years ago

Adding #[from_str(regex = auto)] to instruction field

This is an interesting idea! Does this feature currently exist? I didn't see it documented.

This feature is not yet implemented.

It seems to me that it would be even more useful if this annotation could be applied to enum Instruction instead of the field instruction, so that we can decorate the declaration, not every usage.

I think so too, but I can't think of a way to allow omit #[from_str(regex = auto)].

This is the same for the algorithm described in https://github.com/frozenlib/parse-display/issues/9#issuecomment-740795611.

Since proc-macro cannot determine whether the field type implements the trait or not, you need to annotate whether use FromStr or the different trait described in https://github.com/frozenlib/parse-display/issues/9#issuecomment-741748867.