chyh1990 / yaml-rust

A pure rust YAML implementation.
Apache License 2.0
601 stars 138 forks source link

Parse from raw bytes #155

Open mkmik opened 4 years ago

mkmik commented 4 years ago

YAML 1.2 spec all valid encodings are listed:

On input, a YAML processor must support the UTF-8 and UTF-16 character encodings. For JSON compatibility, the UTF-32 encodings must also be supported.

If a character stream begins with a byte order mark, the character encoding will be taken to be as as indicated by the byte order mark. Otherwise, the stream must begin with an ASCII character. This allows the encoding to be deduced by the pattern of null (#x00) characters.

IIUC the main loader API is load_from_str which takes a rust str which is a unicode string whose internal representation is UTF-8.

There are many ways such a string can be loaded from external input such as file, network, etc; some of them require the users to specify which encodings to support, while others might only support UTF-8)

Callers of the yaml-rust API might not be aware of the subtleties of the YAML-1.2 spec w.r.t allowed input encodings, and hence might decide to load the external YAML file using an UTF-8 decoder (because "everybody is using UTF-8, right?"). The resulting application will thus not accept all valid YAML-1.2 input byte streams.

If the yaml-rust library offered an API that accepts the raw input stream instead of a pre-decoded string, then the user would delegate the library the task of dealing with the gory details of the encodings.

For this to work, the API must be natural to use with files and other input streams and compete with the simplicity and terseness of:

let s = fs::read_to_string(filename).unwrap();
let docs = YamlLoader::load_from_str(s).unwrap();

I'm a rust noob so I won't detail a proposal here. No idea if the Read trait is idiomatic in those cases etc.

The underlying parser expects an Iterator<Item = char> , so anything that can produce such a thing should do.

mkmik commented 4 years ago

Blocked on #139

XVilka commented 4 years ago

@mkmik AppVeyor is green now: https://ci.appveyor.com/project/chyh1990/yaml-rust/builds/34017061