Ethiraric / yaml-rust2

A pure Rust YAML implementation.
https://docs.rs/yaml-rust2/latest/yaml_rust2/
Other
149 stars 12 forks source link

Add comment manipulation to `Yaml` objects #21

Closed robert-sjoblom closed 3 months ago

robert-sjoblom commented 6 months ago

The 1.2 spec of yaml allows comments: https://yaml.org/spec/1.2.2/#66-comments

One current issue with yaml-rust2 (and, to be fair, the original yaml-rust crate) is that it strips all comments. Thus, the following (shortened) example doesn't work:

foo.yaml

value:
  # comment here
  '002':
    - name: foo
let c = std::fs::read_to_string("foo.yaml").unwrap();
let docs = YamlLoader::load_from_str(&contents).unwrap();

// pseudocode omitted, but update '002' with another `Hash` entry

let mut output = String::new();
YamlEmitter::new(&mut output).dump(&docs[0).unwrap();

I would expect the output to contain my new entry and the comment # comment here. That's not what happens, though, since comments are lost.

I'd gladly help out with this, although I'm unsure of where to start.

Ethiraric commented 6 months ago

Hmm. While the specification allows for comments, it is clearly stated that they must not convey information. When testing with the yaml-test-suite, comments cannot emit parser events, meaning that they are unavailable to the YamlLoader. We could add an event variant (yay, another API breaking change) and ignore it in the yaml-test-suite test.

If you want to tackle this:

Then:

To be fair (and as a warning), with my knowledge of the codebase, I'd expect this to take me at least a couple days to implement.

Do you have a reference for another parser (even in another language) that implements this feature?

robert-sjoblom commented 6 months ago

To be fair (and as a warning), with my knowledge of the codebase, I'd expect this to take me at least a couple days to implement.

I'll gladly work on this, it'll take some time but it's not as if anyone else has asked for it.

Do you have a reference for another parser (even in another language) that implements this feature?

I do -- it's what we've been forced to use due to every rust yaml crate being slightly off in what we want (yaml-rust doesn't honor whitespace (blank newlines) and strips comments, another crate honors whitespace but strips comments, a third honors comments but strips indentation...): https://pypi.org/project/ruamel.yaml/

Ethiraric commented 6 months ago

They have indeed added a Comment scanner event. In Python, they can dynamically add fields (which they appear to use for comments). Doing so in Rust would incur a cost that Rustaceans will probably hardly accept.

I don't really see an easy way around that. On the one hand, I don't like duplicating the Yaml structure, but on the other hand I don't see how CommentedYaml could easily re-use Yaml. I think I'll leave up to you to find out ^^

I'll gladly work on this, it'll take some time but it's not as if anyone else has asked for it.

Thank you very much for your dedication! :heart: Also, maybe others don't ask but would use the feature.

Ethiraric commented 6 months ago

Following https://github.com/Ethiraric/yaml-rust2/issues/17#issuecomment-2032585651, you might want to turn your contributions against https://github.com/saphyr-rs/saphyr-parser and https://github.com/saphyr-rs/saphyr. They contain the same code as here except the name has changed (and saphyr has the parser as dependency). I'll see into transferring your issue once Github allows me.

tranzystorekk commented 6 months ago

I'll gladly work on this, it'll take some time but it's not as if anyone else has asked for it.

The lack of this feature seems to prevent yaml-rust2 from correctly parsing sublime-syntax files, e.g.: https://github.com/sublimehq/Packages/blob/fa6b8629c95041bf262d4c1dab95c456a0530122/Makefile/Makefile.sublime-syntax

This in turn prevents adoption in the syntect crate.

Ethiraric commented 6 months ago

What exactly is the error you encounter? The file should parse fine if it is valid yaml. Do you depend on the comments in your code?

tranzystorekk commented 6 months ago

Trying to parse that file with YamlLoader::load_from_str errors with:

called `Result::unwrap()` on an `Err` value: ScanError { mark: Marker { index: 10, line: 2, col: 0 }, info: "comments must be separated from other tokens by whitespace" }

EDIT: I might have wrongly assumed that this is related to this feature, but still probably shouldn't happen? Syntect seems to be parsing this file correctly with the old yaml-rust crate

Ethiraric commented 6 months ago

That appears to be an issue with the parser indeed. It is however unrelated to that specific Github issue which aims at adding comments to the Yaml object.

From what I saw in the yaml parser playground, the file appears to be valid. I'll try and have a look during the weekend, otherwise later week.

Ethiraric commented 6 months ago

This is fixed in 0eb7936e1e85553031dbcb99fa8837be3484c2ee.

It was specifically an issue with comments on column 0 on the line immediately following a tag directive ^^"

Ethiraric commented 5 months ago

@robert-sjoblom have you started working on the feature?

I have embarked on a heavy refactoring in the scanner in order to support borrowing in saphyr-parser.

robert-sjoblom commented 5 months ago

I have not; I've been somewhat swamped both at work and in my "spare" time (as the lack of a reply indicates). I hope that I'll get some time next week, but ultimately I suspect this'll be a slow-moving feature, what with family/work taking up most of my time.

Ethiraric commented 3 months ago

This will not be implemented in yaml-rust2, but is rather an objective for saphyr.