avitex / rust-dangerous

Rust library for safely and explicitly parsing untrusted data
MIT License
51 stars 4 forks source link

Refactor and improve docs #8

Closed avitex closed 3 years ago

avitex commented 3 years ago

TODO

codecov[bot] commented 3 years ago

Codecov Report

Merging #8 into master will increase coverage by 0.28%. The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #8      +/-   ##
==========================================
+ Coverage   70.88%   71.16%   +0.28%     
==========================================
  Files          19       19              
  Lines        1271     1325      +54     
==========================================
+ Hits          901      943      +42     
- Misses        370      382      +12     
Impacted Files Coverage Δ
src/error/traits.rs 0.00% <ø> (ø)
src/error/expected.rs 61.07% <48.48%> (-5.10%) :arrow_down:
src/input/mod.rs 62.96% <73.68%> (-0.20%) :arrow_down:
src/reader.rs 59.09% <79.16%> (+5.99%) :arrow_up:
src/input/internal.rs 55.39% <80.00%> (+6.14%) :arrow_up:
src/error/invalid.rs 78.26% <100.00%> (+5.53%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7118d46...63da03d. Read the comment docs.

Byron commented 3 years ago

Thanks a lot for these refinements, I will update my PR with the changes (if needed) once this one is merged.

span_of(…) will work perfectly for my usecase, I believe, thanks for that. Hopefully I can put it to the test today.

Add controlling the display of input in an error case via environment variables?

This would be a solution for the usecase I presented, yet I wonder how discoverable such a feature could be. One thought of mine would be to detect if an Error is displayed under test to advertise the environment variable if binary output would be chosen. Maybe too much, and possibly confusing for people who write binary parsers.

avitex commented 3 years ago

I'm going to hold off on merging this tonight. I want to do some more thinking about RetryRequirement and no_retry. I feel like I'm adding a complex solution to a footgun here and I want to see if there is a better solution.

use dangerous::Invalid;

let input = dangerous::input(b"blah");
let result: Result<_, Invalid> = input.read_all(|r| {
  r.take(2)?.read_all::<_, _, Invalid>(|r| r.take(4))?;
  r.consume(b"ah")
});

// This currently will pass, but it really shouldn't
assert!(result.unwrap_err().to_retry_requirement().is_some());

Use the below if you wish to use these changes so far, I'm about to add a failing test:

dangerous = { git = "https://github.com/avitex/rust-dangerous", commit = "6e716cc90531a65fdc7396007a8e5ae4c4b9e075" } # fix grammar
avitex commented 3 years ago

Seeing as read_all and read_partial now require a IntoFatal constraint (which in turn requires ToRetryRequirement) I'm thinking I should consolidate the error traits a little.

avitex commented 3 years ago

This would be a solution for the usecase I presented, yet I wonder how discoverable such a feature could be. One thought of mine would be to detect if an Error is displayed under test to advertise the environment variable if binary output would be chosen. Maybe too much, and possibly confusing for people who write binary parsers.

So I can understand a bit better, you prefer the bytes-ascii format over str for git-config? Are there cases you expect git-config will have binary data, or does the str format not fit your needs? I saw bytes-ascii as more of a fallback for str when there was invalid UTF-8

Byron commented 3 years ago

tl;dr: git doesn't assume UTF-8, so gitoxide can't do that either.

It's something specific to git, which itself doesn't know or care about unicode/UTF-8 while parsing, but instead expects ascii-compatible encodings.

Thus I am stuck with not assuming UTF-8, and instead delegate decoding of strings to the caller. This currently is done everywhere, particularly when decoding objects, so fields there are represented by &BStr of the bstr crate. The same I envision for parsed values obtained with the git-config parser.

avitex commented 3 years ago

tl;dr: git doesn't assume UTF-8, so gitoxide can't do that either.

Hmmm, this makes me think I should improve rendering for bytes-ascii or provide another format (perhaps str-ascii), I think your usecase would need a better output then than just ['h' 'e' 'l' 'l' 'o' ff] more like "hello\xff"` or something, the former seems far too verbose. What are you thoughts?

Byron commented 3 years ago

True - having the former is good enough for development, but having the latter would be desirable for exposure to users.

Thus far I have been throwing bstr at it, which automatically makes everything printable, leaving unprintable characters escaped to become printable after all.