chyh1990 / yaml-rust

A pure rust YAML implementation.
Apache License 2.0
601 stars 138 forks source link

Implement load_from_bytes #156

Open mkmik opened 4 years ago

mkmik commented 4 years ago

Closes #155

Also helps in some cases with #142, when the BOM is at the beginning of the file (common), but not in the corner case where the BOM is at the start of a document which is not the first one.

CC @glyn

mkmik commented 4 years ago

Blocked on #139

chyh1990 commented 4 years ago

since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate

glyn commented 4 years ago

since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate

I think it would be worth including this to ensure consistency of behaviour between users. Also, providing it here makes it more likely that users will consider handling an initial BOM in the first place (rather than finding/fixing a bug).

mkmik commented 4 years ago

yes; more explicitly: it's likely that if this library doesn't do it, the users of this library won't do it either and thus effectively their application won't be compliant with the YAML specification.

The reason I think that's likely, is that in my experience most people either are not aware of this detail of the spec and/or don't think UTF-16 or UTF-8 with BOMs is important at all (e.g. "I never saw some UTF-16 for years, so clearly nobody is using it").

I think adding a small crate dependency is a small price to pay for ensuring that the ecosystem built around this library follows the spec.

chyh1990 commented 4 years ago

pls rebase this MR to fix CI

mkmik commented 4 years ago

@glyn I implemented (d), PTAL

glyn commented 4 years ago

@glyn I implemented (d), PTAL

Looks reasonable. One downside of this approach now becomes apparent. In:

let mut d = Decoder::read(file)
let mut e = d.encoding_trap(mytrap)
let f = d.decode()
...

d.encoding_trap(mytrap) mutates d and so f is computed with mytrap, which is surprising if you don't pay attention to the signatures. This is unusual behaviour for the builder pattern. But it's unavoidable if we go with option (d).

mkmik commented 4 years ago

@glyn

let mut d = Decoder::read(file) let mut e = d.encoding_trap(mytrap) let f = d.decode()

I'm not a rust expert but that's not how this builder pattern works, I followed https://doc.rust-lang.org/1.0.0/style/ownership/builders.html#consuming-builders: which doesn't use references.

this means that I cannot call d.decode on a value if I already called d.encoding_trap on it.

error[E0382]: use of moved value: `d`
   --> src/yaml.rs:881:19
    |
879 |         let mut d = YamlDecoder::read(s as &[u8]);
    |             ----- move occurs because `d` has type `yaml::YamlDecoder<&[u8]>`, which does not implement the `Copy` trait
880 |         d.encoding_trap(encoding::DecoderTrap::Ignore);
    |         - value moved here
881 |         let out = d.decode().unwrap();
    |                   ^ value used here after move

using mut self initially was confusing to me too; I initially wrote it as:

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

but then I saw the official documentation example with mut self in the fn signature, and assumed it was idiomatic rust.

mkmik commented 4 years ago

CI fails because of a transient error

error: caused by: failed to make network request
error: caused by: error sending request for url (https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256): error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: dns error: No such host is known. (os error 11001)
error: caused by: No such host is known. (os error 11001)
Command exited with code 1

amending and force pushing to trigger a new CI run

glyn commented 4 years ago

@mkmik mut self is indeed idiomatic rust. It's good the compiler protects us from the issue to some extent. I'm still a bit concerned that encoding_trap() both mutates self and returns self. Perhaps it would be cleaner to give up on chaining and just have it mutate self but not return anything.

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

would have issues because self can't necessarily be copied. In particular, the std::io::Read value can't be copied and can only be used once.

mkmik commented 4 years ago

Sorry, I don't understand. This code works:

  pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

if I understand correctly, it's not copying self, but it's transferring ownership to 'new', which is mutable so I can now modify the trap field and return the modified value. The caller of encoding_trap now can't access the value it called encoding_trap on because the value's ownership has been moved.

(sorry if I'm missing the point; I'm unfamiliar with rust in particular, although I have some familiarity with type systems and compiler internals, so almost kinda sorta grasp how this things works, but mostly I have no idea about what's "idiomatic")

glyn commented 4 years ago

@mkmik I'm just getting back into Rust after a long break, so apologies for my lack of understanding. I'm not sure about:

pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
    let mut new = self;
    new.trap = trap;
    new
  }

You're quite right that self is moved to new, but I don't particularly like that encoding_trap renders self subsequently unusable. If I find out more, I'll post here. But anyway, let's put that approach to bed either way.

Diegovsky commented 3 years ago

Is this still being worked on?

glyn commented 3 years ago

@SenseTime-Cloud @dtolnay Is anyone looking to merge PRs these days?

dtolnay commented 2 years ago

Not me — this crate is no longer used by serde_yaml.

davvid commented 7 months ago

FWIW I've merged (and various other PRs) this into my fork: https://github.com/davvid/yaml-rust/