Ethiraric / yaml-rust2

A pure Rust YAML implementation.
https://docs.rs/yaml-rust2/latest/yaml_rust2/
Other
156 stars 12 forks source link

Extended data in tag suffix not handled #11

Closed wgeraghty closed 4 months ago

wgeraghty commented 7 months ago
%YAML 1.1
%TAG !t! tag:test,2024:
--- !t!1 &1 foo
bar: "baz"

Results in ScanError { mark: Marker { index: 53, line: 4, col: 3 }, info: "mapping values are not allowed in this context" }.

I'm not sure if this is in the spec but the yaml files I'm working with have it. My workaround for now is changing while is_tag_char(self.look_ch()) to while !is_breakz(self.look_ch()). This doesn't feel quite right but it allows me to use a custom MarkedEventReceiver to do what I need.

davvid commented 7 months ago

We use https://github.com/yaml/yaml-test-suite to validate yaml-rust2. If this is in-spec then this could be made into a test file in that repo.

If this is out of spec.. (sorry, I don't know whether it is, but it very well might be) what are your thoughts @Ethiraric ~ is adding booleans to YamlLoader to opt-in to non-standard behavior something we should entertain on a case-by-case basis?

@wgeraghty are these files from a common tool? Are the files available in a public repo anywhere?

wgeraghty commented 7 months ago

@davvid See https://github.com/Ethiraric/yaml-rust2/issues/10#issuecomment-2016711947

Ethiraric commented 7 months ago

This construct is invalid. I am on mobile and can't check right now which part of the specification this violates, but the yaml playground errors out with this specific input.

This now brings interesting and hard questions about yaml compliance. Technically, the bug report should be filed to whoever distributed the yaml. But fixing the tag issue from #10 likely would break most yaml parsers, due to most yaml parsers not being compliant.

I would like this library to still be usable as a general yaml parser, but we need to carefully think about which non-compliant constructs we want to allow and disallow. Each construct will add branches and slow the code down a tiny bit, even for those who had fully compliant yamls.

Rust features could allow a good middle ground between the 2, but more and more feature toggles exponentially increase the complexity of testing (a test may succeed with a set of features and fail with another).

I'm definitely okay with fixing #10. I guess that it's a common "mistake". This one I don't know. Is the source for this open? Could it be fixed there? Do they have a specification for their format?

wgeraghty commented 7 months ago

I have not been able to find the spec for the Unity yaml format; their parser is not open. There is a massive amount of content generated already that uses this format so they are unlikely to change it.

I wasn't sure if this change made sense to put behind a yaml_rust2::parser::Parser flag, a feature toggle, or a fork.

Ethiraric commented 7 months ago

How exactly to handle that depends on how fragmented the YAML flavours are across all tools. It requires knowledge that I currently do not have.

Your issue sparks this discussion, and I think I'll need to ask around a bit about the YAML ecosystem before coming to a conclusion.

In an ideal world I'd have a way to just get a copy of reference files from every tool and just feed them into the parser to see what is most common and what isn't, what is fixable and what isn't, and so on. I'll make my very best to do this ASAP, but it may take me a week to a month to get all that data.

If you depend on this feature immediately, I'd unfortunately suggest you patch the code for now. I think forking in the worst solution on the long term, but it might be a solution on the short term (unless your build process allows for integrating this as a submodule, using yaml-rust2 = { version = "0.7.0", path = "./dependencies/yaml-rust2" } in your Cargo.toml and applying a patch). How would that sound with you?

wgeraghty commented 7 months ago

I can modify the build process so a submodule should be good. Thanks for looking into this.

Ethiraric commented 7 months ago

You're welcome!

By the way, this appears to be a pretty Unity-specific thing to do. I tried other parsers and they all error out with your input. None of them like the space in the tag.

davvid commented 7 months ago

Here are a couple of related forks that I believe might've been created to address this use case:

Heads-up @xmxu @ilinchunjie ~ I'm pinging since I think this topic might be of interest to you.

Ethiraric commented 4 months ago

This will not be implemented in yaml-rust2. There could be a crate under saphyr-rs to parse Unity files, but we'll have to finalize the crate before.