Open mkmik opened 4 years ago
Blocked on #139
since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate
since this MR introduce a new crate dependency, and users can easily implement this helper method in their code. I'm not sure wether we should provide this in yaml crate
I think it would be worth including this to ensure consistency of behaviour between users. Also, providing it here makes it more likely that users will consider handling an initial BOM in the first place (rather than finding/fixing a bug).
yes; more explicitly: it's likely that if this library doesn't do it, the users of this library won't do it either and thus effectively their application won't be compliant with the YAML specification.
The reason I think that's likely, is that in my experience most people either are not aware of this detail of the spec and/or don't think UTF-16 or UTF-8 with BOMs is important at all (e.g. "I never saw some UTF-16 for years, so clearly nobody is using it").
I think adding a small crate dependency is a small price to pay for ensuring that the ecosystem built around this library follows the spec.
pls rebase this MR to fix CI
@glyn I implemented (d), PTAL
@glyn I implemented (d), PTAL
Looks reasonable. One downside of this approach now becomes apparent. In:
let mut d = Decoder::read(file)
let mut e = d.encoding_trap(mytrap)
let f = d.decode()
...
d.encoding_trap(mytrap)
mutates d
and so f
is computed with mytrap
, which is surprising if you don't pay attention to the signatures. This is unusual behaviour for the builder pattern. But it's unavoidable if we go with option (d).
@glyn
let mut d = Decoder::read(file) let mut e = d.encoding_trap(mytrap) let f = d.decode()
I'm not a rust expert but that's not how this builder pattern works, I followed https://doc.rust-lang.org/1.0.0/style/ownership/builders.html#consuming-builders: which doesn't use references.
this means that I cannot call d.decode
on a value if I already called d.encoding_trap
on it.
error[E0382]: use of moved value: `d`
--> src/yaml.rs:881:19
|
879 | let mut d = YamlDecoder::read(s as &[u8]);
| ----- move occurs because `d` has type `yaml::YamlDecoder<&[u8]>`, which does not implement the `Copy` trait
880 | d.encoding_trap(encoding::DecoderTrap::Ignore);
| - value moved here
881 | let out = d.decode().unwrap();
| ^ value used here after move
using mut self
initially was confusing to me too; I initially wrote it as:
pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
let mut new = self;
new.trap = trap;
new
}
but then I saw the official documentation example with mut self
in the fn signature, and assumed it was idiomatic rust.
CI fails because of a transient error
error: caused by: failed to make network request
error: caused by: error sending request for url (https://static.rust-lang.org/dist/channel-rust-stable.toml.sha256): error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: error trying to connect: dns error: No such host is known. (os error 11001)
error: caused by: dns error: No such host is known. (os error 11001)
error: caused by: No such host is known. (os error 11001)
Command exited with code 1
amending and force pushing to trigger a new CI run
@mkmik mut self
is indeed idiomatic rust. It's good the compiler protects us from the issue to some extent. I'm still a bit concerned that encoding_trap()
both mutates self
and returns self
. Perhaps it would be cleaner to give up on chaining and just have it mutate self
but not return anything.
pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
let mut new = self;
new.trap = trap;
new
}
would have issues because self can't necessarily be copied. In particular, the std::io::Read
value can't be copied and can only be used once.
Sorry, I don't understand. This code works:
pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
let mut new = self;
new.trap = trap;
new
}
if I understand correctly, it's not copying self, but it's transferring ownership to 'new', which is mutable so I can now modify the trap field and return the modified value. The caller of encoding_trap
now can't access the value it called encoding_trap
on because the value's ownership has been moved.
(sorry if I'm missing the point; I'm unfamiliar with rust in particular, although I have some familiarity with type systems and compiler internals, so almost kinda sorta grasp how this things works, but mostly I have no idea about what's "idiomatic")
@mkmik I'm just getting back into Rust after a long break, so apologies for my lack of understanding. I'm not sure about:
pub fn encoding_trap(self, trap: encoding::types::DecoderTrap) -> YamlDecoder<T> {
let mut new = self;
new.trap = trap;
new
}
You're quite right that self
is moved to new
, but I don't particularly like that encoding_trap
renders self
subsequently unusable. If I find out more, I'll post here. But anyway, let's put that approach to bed either way.
Is this still being worked on?
@SenseTime-Cloud @dtolnay Is anyone looking to merge PRs these days?
Not me — this crate is no longer used by serde_yaml.
FWIW I've merged (and various other PRs) this into my fork: https://github.com/davvid/yaml-rust/
Closes #155
Also helps in some cases with #142, when the BOM is at the beginning of the file (common), but not in the corner case where the BOM is at the start of a document which is not the first one.
CC @glyn