acatton / serde-yaml-ng

Strongly typed YAML library for Rust, serde compatible. This is an independant continuation of serde-yaml from dtolnay.
MIT License
18 stars 4 forks source link

Why `from_reader` expect that source will live at least the same time as result? #12

Open Mingun opened 1 month ago

Mingun commented 1 month ago

The method from_reader accepts Read + 'de https://github.com/acatton/serde-yaml-ng/blob/e158e717b22961fd903ff5b5b672fcf3b1853338/src/de.rs#L87-L93 which mean that any references in type R (the type of reader) which implements Read must outlive 'de. The practical aspects of this restriction is unclear. When we read from std::io::Read data is always copied from R to the internal buffer, so it cannot be linked to the lifetime of any references in R: https://github.com/acatton/serde-yaml-ng/blob/e158e717b22961fd903ff5b5b672fcf3b1853338/src/loader.rs#L26-L32

Even if not use read_to_end, the methods in Read trait does not allow to link lifetime of output buffer with lifetimes of the reader (the R type). They are two completely unrelated lifetimes.

One practical limitation of such restriction is that this code does not compile (in this case that is not so bad, because it anyway will return deserialization error in runtime, because the data cannot be borrowed from a reader):

#[test]
fn test() {
  let data = {
    let yaml = "test".to_string();
    <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap()
  };
  assert_eq!(data, "test");
}

Error:

error[E0597]: `yaml` does not live long enough
   --> src\parser\mod.rs:908:66
    |
906 |   let data = {
    |       ---- borrow later stored here
907 |     let yaml = "test".to_string();
    |         ---- binding `yaml` declared here
908 |     <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap()
    |                                                                  ^^^^ borrowed value does not live long enough
909 |   };
    |   - `yaml` dropped here while still borrowed

@dtolnay, you add this implementation in 4da5934ee4d8e4ec326172664bf62ad231b819ab. Could you explain, why you use Read + 'de instead of just Read (which assumes Read + 'static) or some different lifetime Read + 'reader?

My question about this because in other fork, serde_yml @sebastienrousseau changed the reader in a such way, that this test will success while it is failed in serde_yaml_ng:

// serde_yml = "0.0.12"
#[test]
fn yml() {
  let yaml = "test".to_string();
  let data = <&str>::deserialize(serde_yml::Deserializer::from_reader(yaml.as_bytes())).unwrap();
  // Failed with this error, because first 2 bytes somehow was corrupted,
  // but in many real cases can pass. For example, if change the input to
  // (add two spaces):
  // let yaml = "  test".to_string();
  // ---- parser::yml stdout ----
  // thread 'parser::yml' panicked at src\parser\mod.rs:901:3:
  // assertion failed: `(left == right)`
  // 
  // Diff < left / right > :
  // <♀ st
  // >test
  assert_eq!(data, "test");
}

// serde_yaml_ng = "0.10.0"
#[test]
fn yaml_ng() {
  let yaml = "test".to_string();
  // ---- parser::yaml_ng stdout ----
  // thread 'parser::yaml_ng' panicked at src\parser\mod.rs:908:93:
  // called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"test\", expected a borrowed string", line: 1, column: 1)
  let data = <&str>::deserialize(serde_yaml_ng::Deserializer::from_reader(yaml.as_bytes())).unwrap();
  assert_eq!(data, "test");
}

It seems to me that the change is obviously incorrect, because you borrowed from the internal buffer of the serde_yml::Deserializer, but Rust borrowing rules allows to write such code and I does not yet find a way to crash it (which is definitely should be possible). It should be possible if we will try to change yaml via the deserialized data, but in that case need that data have a &mut str type which doesn't implement Deserialize trait. So it seems that we can change the string only using unsafe conversion of its type, which couldn't be considered as a true demonstration of a problem.

Unfortunately, @sebastienrousseau make this change in one single god commit without any explanations why it was changed. (Digression: the borrowing is possible, if repr field of the Scalar would be Some (it also very unpleasant why this filed is not included in debug representation while it has so much influence on results of deserialization)).

The changes in serde_yml that makes the test above work in those lines (filling repr field): https://github.com/sebastienrousseau/serde_yml/blob/24187306a30cc972780449e843ccc7b6f82b6086/src/libyml/parser.rs#L256-L269 https://github.com/sebastienrousseau/serde_yml/blob/24187306a30cc972780449e843ccc7b6f82b6086/src/libyml/parser.rs#L308-L315 He only changes &Cow<input, str> to &input Cow<input, str> in convert_event function and the borrow checker became happy. I do not understand why test does not crashed, and if this change has some rationaly, is it possible to apply it to this fork? Actually, I dig into this rabbit hole because I want to use

#[derive(Deserialize)]
#[serde(from = "&str")]
struct Path(Vec<String>);
impl<'a> From<&'a str> for Path {
  fn from(path: &'a str) -> Self {
    Self(path.split("::").map(|s| s.to_owned()).collect())
  }
}

This code works with serde_yml and does not work with serde_yaml_ng.

acatton commented 1 month ago

This is late, and I've only skimed through this ticket, I need to look into details, however, I wanted to mention this part

@dtolnay, you add this implementation in https://github.com/acatton/serde-yaml-ng/commit/4da5934ee4d8e4ec326172664bf62ad231b819ab. Could you explain, why you use Read + 'de instead of just Read (which assumes Read + 'static) or some different lifetime Read + 'reader?

David Tolnay is the original author of https://github.com/dtolnay/serde-yaml, which he abandoned, and I forked. Please refer to the "Why?" section of the README.md.

The life of a open source library author and maintainer is hard, and obviously, David didn't want to deal with this charge of work, I don't want us to bother him, the maintenance of this library is my burden, and mine alone.

Let's not mention him, and let's not ping him, let's leave him and the original authors alone, and in peace. Let's figure this out together :) .

Anyway, thanks for your report, and I'll take a look at it once I had some rest.

Mingun commented 1 month ago

David Tolnay is the original author of https://github.com/dtolnay/serde-yaml, which he abandoned, and I forked. Please refer to the "Why?" section of the README.md.

I know that, as you might have guessed, I looked at the history, not to mention what I had previously used serde_yaml.

The life of a open source library author and maintainer is hard, and obviously, David didn't want to deal with this charge of work, I don't want us to bother him, the maintenance of this library is my burden, and mine alone.

I think, that asking questions costs nothing, especially considering what kind of question. I don't think anyone can get into his head and answer for him. Of course, he is free not to answer.

acatton commented 1 month ago

I think, that asking questions costs nothing, especially considering what kind of question. I don't think anyone can get into his head and answer for him. Of course, he is free not to answer.

I understand, if you would have told me ~10 years ago I would have agreed with you. However, now that I have a few projects that I have abandoned, I would understand how annoying and noisy it is, when your github notifications or your inbox is full of people mentioning your for help on stuff you don't want to deal with anymore.

So this is why I'm trying to mindful with David :) . But anyway, it was an request, not a demand, you're free to think otherwise :slightly_smiling_face:

acatton commented 1 month ago

Lets close the side discussion from above, and go back to the original issue of this thread.

I looked at the code, and read your analysis, Thanks you for analyzing this @Mingun . I think you're right. I would be opened to a pull request to remove the life time. I think it was a mistake, and while it might breaking backward compatibility (some people might be explicitly using these lifetimes), I think it's fine for 99% of users.

I will eventually take on this task if you nobody has taken it before me.