dtolnay / request-for-implementation

Crates that don't exist, but should
610 stars 6 forks source link

Minimalist YAML parser with support for borrowed string slices #9

Open dtolnay opened 5 years ago

dtolnay commented 5 years ago

In https://github.com/dtolnay/serde-yaml/issues/94 I would like to move serde-yaml off of yaml-rust as a YAML backend and to a much simpler backend with support for a minimum viable subset of YAML syntax.

I think the minimal API for this would be:

pub enum Yaml<'a> {
    // strings, booleans, numbers, nulls, all treated the same
    Scalar(&'a str),

    // flow style like `[x, x, x]`
    // or block style like:
    //     - x
    //     - x
    Sequence(Vec<Yaml<'a>>),

    // flow style like `{x: X, x: X}`
    // or block style like:
    //     x: X
    //     x: X
    Mapping(Vec<Entry<'a>>),
}

pub struct Entry<'a> {
    key: Yaml<'a>,
    value: Yaml<'a>,
}

pub fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>;

The long tail of seldom used YAML features need not be supported. That means no tags, anchors, aliases, countless distinct string types, ...

ghost commented 5 years ago

In python world there's strictyaml with similar goal, if you want to check what elements are needed

dtolnay commented 5 years ago

Thanks! That looks like a super nicely done library with similar motivations as this project.

jakeschurch commented 5 years ago

@dtolnay I think I would like to take a crack at this

dtolnay commented 5 years ago

Just checking in -- any progress on this @jakeschurch? Is there anything you are stuck on?

jakeschurch commented 5 years ago

Thanks for checking in @dtolnay -- so far things are going well - will def. Check in if I have any questions or problems

dtolnay commented 5 years ago

Sounds good. I noticed that you forked serde-yaml and made some big changes to its Value type. I would recommend sticking to a 100% self contained fn parse<'a>(input: &'a str) -> Result<Yaml<'a>> instead -- serde_yaml::Value is a frontend type and shouldn't need to change at all when we later swap backends.

jakeschurch commented 5 years ago

OK. Sounds like I need to read a little more into the code.

dtolnay commented 5 years ago

Less into the code. :wink:

Nothing in serde_yaml is relevant to implementing this function.

jakeschurch commented 5 years ago

Sorry! Just meant what parts of the code now hook up to yaml-rust so I can have the signatures in mind implementing the back-end

jakeschurch commented 5 years ago

hey @dtolnay just wanted to let you know that I haven't gotten around to implementing this yet cause I'm super busy with work - whatever you want to do, let me know -- sorry to keep you waiting

pickfire commented 5 years ago

I believe it would be nice to support the additional feature using feature gates. I will be trying this as it is a blocker for me to implement yaml2json.

pickfire commented 5 years ago

I added my implementation in https://github.com/pickfire/mini-yaml, very experimental. Currently can parse certain flow scalar, flow/block sequence, flow mapping and block implicit mapping. Still figuring out how to parse implicit flow mapping and maybe later need to check the spec again, maybe later can add serialization and deserialization.

dtolnay commented 5 years ago

Nice, looks like this is on the right track. When you wrap up the initial set of features it would be good to write a tool to programmatically partition the yaml-test-suite based on ones that are supported by your library and ones that are not supported, and check those into your repo as two separate directories. That will make it easy to scan through and tell whether the set of supported cases is sufficient for what someone needs, or if they require some feature that is in the unsupported set.

pickfire commented 5 years ago

@dtolnay Thanks a lot for sharing that link. By the way, do you think I should implement it in a function way instead of using an iterator like what yaml-rust does? Maybe it might help in streaming serialization/deserialization?

dtolnay commented 5 years ago

I recommend not doing stream style. Stream support would be important for a lossy deserializer, for example one that deserializes 1.00 as an f64 by default. Like if the input document is:

---
n: 1.00

then a non-streaming lossy deserializer interpreting this as yaml::Map(map! { yaml::String("n") => yaml::Number(1.0f64) }) would be problematic because the caller may ultimately need the data as struct Input { n: String } in which the correct deserialization would have n = "1.00".

Since the design in this thread is specced to produce lossless &'a str for all yaml scalars, there shouldn't need to be a stream API.

pickfire commented 4 years ago

@dtolnay Looks a lot harder than I think, I probably will not be continuing the project.

If anyone is interested can take a look at https://github.com/pickfire/mini-yaml, the implementation is a bit bad but the main issue is support for implicit block mapping and block and flow scalar. I believe supporting strictyaml would be better than yaml since it's too broad for start.

Stupremee commented 4 years ago

I will try to create a crate based on the StrictYAML syntax.

dtolnay commented 4 years ago

Sounds good. Thanks! The API design in https://github.com/dtolnay/request-for-implementation/issues/9#issue-401158316 should work just as well for StrictYAML.

Stupremee commented 4 years ago

To make it simpler to write a parser, I thought about only implementing a super simple and minimalistic version of yaml.

These would be all the possibilities:

key1: 'value'
key2: "value"
key3: value
list:
  - one
  - two
list2: [foo, bar]
object:
  some: "thing"
  foo: bar
object2: {a: one, b: two}

value can also be a number Would this be okay?

dtolnay commented 4 years ago

Yes that sounds fine. If someone needs more syntax support they can extend your library.

Stupremee commented 4 years ago

Should the library support serializing and deserializing?

pickfire commented 4 years ago

@Stupremee Serializing and deserializing could be added later but it should be designed with that in mind, taking a look at serde_json and serde_cbor would help but the design of yaml-rust is way too different from those.

dtolnay commented 4 years ago

@Stupremee⁠—Personally for this issue I really only care about one function.

fn parse<'a>(input: &'a str) -> Result<Yaml<'a>>

Once that is in a good state, it would be reasonable to add the reverse as well. That would be as simple as impl Display for Yaml<'_>.

nathanwhit commented 4 years ago

@dtolnay I noticed this has been sitting for a while, so I decided to pick this up. The repository with my work up to this point is here. Support for the bare-bones syntax you described is already implemented, and I do have a basic level of testing (~40 unit tests, but I plan to add more). I haven't published the crate yet, as I feel it needs a bit more polish and more thorough testing first, but tentatively a release doesn't seem too far off. If you have any spare time, feedback on the current state of the crate would be wonderful.

NilsIrl commented 4 years ago

to a much simpler backend with support for a minimum viable subset of YAML syntax

How would someone that needs these "non-minimalist features" do?

pickfire commented 4 years ago

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

NilsIrl commented 4 years ago

@NilsIrl YAML is complicated, last time I tried implementing I was surprised how different YAML libraries got different results for different edge cases. I think StrictYAML may be a better and stricter spec and easier to follow with minimalist features.

I understand that very well, but if this is supposed to replace https://github.com/chyh1990/yaml-rust for https://github.com/dtolnay/serde-yaml then it can be problematic.

dtolnay commented 4 years ago

@nathanwhit it looks like the feedback I filed on https://github.com/nathanwhit/minimal-yaml were mostly resolved. Any plan on what polish or testing are missing still before it would be released?

nathanwhit commented 4 years ago

@dtolnay I'm comfortable releasing it in its current state. I wouldn't be too surprised if there are some edge cases being handled improperly, but there shouldn't be any major issues at this point. I've gone ahead and published the crate here.

sanbox-irl commented 3 years ago

@dtolnay how are we doing on this?

qm3ster commented 2 years ago

I would like to note that StrictYAML actually doesn't allow "flow style" (other than in dirty_load mode, which also allows and interprets "node anchors and refs"), so the following would not be valid:

list2: [foo, bar]
object2: {a: one, b: two}

I wonder if this means that in non-dirty_load mode there are less characters that have to be avoided in strings. Will I finally be able to write

action_jq: 'if . then "ON" else "OFF" end | {state_l1: .}'

unquoted?

action_jq: if . then "ON" else "OFF" end | {state_l1: .}

Currently, neither the serde_yaml nor VS Code's YAML syntax higlighting manage to survive a non-leading opening { in an unqioted string, despite there not being any possible interpretation. But fully giving up on flow style would ublock all of []{} anywhere, and that may be a worthwhile tradeoff.