duesee / imap-codec

Rock-solid and complete codec for IMAP
Apache License 2.0
35 stars 13 forks source link

fix(misuse): Add validation to deserialization #504

Closed HenningHolmDE closed 2 weeks ago

HenningHolmDE commented 2 weeks ago

Previously, correctness through the existing validation functions was not enforced during deserialization. This is now achieved by going through TryFrom implementations or specific field deserialization functions. Thus, it should no longer be possible to create invalid instances by deserialization.

This fixes #502.

coveralls commented 2 weeks ago

Pull Request Test Coverage Report for Build 9523103389

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/datetime.rs 35 36 97.22%
imap-types/src/core.rs 133 139 95.68%
<!-- Total: 201 208 96.63% -->
Files with Coverage Reduction New Missed Lines %
imap-types/src/mailbox.rs 40 69.19%
imap-types/src/datetime.rs 49 72.93%
imap-types/src/core.rs 367 70.6%
<!-- Total: 456 -->
Totals Coverage Status
Change from base Build 9522522665: -3.2%
Covered Lines: 10167
Relevant Lines: 11724

💛 - Coveralls
HenningHolmDE commented 2 weeks ago

@duesee I just read once again through your reply in the issue and noticed that I overlooked your hint to CommandContinuationRequestBasic, which has no validate method but a validation in its new method. I will take a look at that tomorrow.

(Also I will look into the strange non-diff the failing check is complaining about.)

coveralls commented 2 weeks ago

Pull Request Test Coverage Report for Build 9527640958

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/mailbox.rs 31 32 96.88%
imap-types/src/core.rs 123 128 96.09%
imap-types/src/response.rs 21 32 65.63%
<!-- Total: 208 225 92.44% -->
Files with Coverage Reduction New Missed Lines %
imap-types/src/datetime.rs 29 80.29%
imap-types/src/mailbox.rs 39 69.19%
imap-types/src/response.rs 70 52.84%
imap-types/src/core.rs 367 70.6%
<!-- Total: 505 -->
Totals Coverage Status
Change from base Build 9522522665: -3.4%
Covered Lines: 10190
Relevant Lines: 11783

💛 - Coveralls
jakoschiko commented 2 weeks ago

You are using #[serde(try_from = "String")]. As far as I can tell this will create an intermediate copy and do some unnecessary unicode checks. Did you try #[serde(try_from = "&'a [u8]")]? I was not able to compile your branch with my suggestion, but it works in playground.

coveralls commented 2 weeks ago

Pull Request Test Coverage Report for Build 9544457310

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
imap-types/src/mailbox.rs 31 32 96.88%
imap-types/src/core.rs 123 128 96.09%
imap-types/src/response.rs 21 32 65.63%
<!-- Total: 208 225 92.44% -->
Files with Coverage Reduction New Missed Lines %
imap-types/src/datetime.rs 29 80.29%
imap-types/src/mailbox.rs 39 69.19%
imap-types/src/response.rs 70 52.84%
imap-types/src/core.rs 367 70.6%
<!-- Total: 505 -->
Totals Coverage Status
Change from base Build 9539427042: -3.4%
Covered Lines: 10190
Relevant Lines: 11783

💛 - Coveralls
HenningHolmDE commented 2 weeks ago

@jakoschiko Thanks for pointing this out. I will try to address the issues in your comment one by one:

You are using #[serde(try_from = "String")]. [...]

Hopefully, I only added deserialization through String in cases, where the underlying type is some type of string (e.g. Cow<'a, str>). If you found any occurence where I went through a String on the way to a byte type (e.g. Cow<'a, [u8]>), please let me know. That would definetely be incorrect.

[...] As far as I can tell this will create an intermediate copy [...]

I would agree that deserializing into String will copy the data from the deserializer. After that, moving through TryFrom into Cow should not result into creating an additional copy. Let's check this in the playground.

However, if you are suggesting that previously, data was not copied, this seems not to be the case as deserialization into a Cow always results in a Cow::Owned, thus copying the data. (see https://github.com/serde-rs/serde/issues/1852) There seem to be ways to mitigate this (e.g. using #[serde(borrow)] or introducing serde_with::BorrowCow). But when I tried this out briefly, I ended up sprinkling borrow all over the place and yet was not able to make things work without copying when used in combination with try_from. There is probably some way of doing this, but would suggest adding an issue to tackle this optimization at a later point in time. What do you think?

[...] and do some unnecessary unicode checks. [...]

Where would you expect additional unicode checks? On the way to the first Rust string type in the conversion, there will always be a unicode check to ensure Rust's UTF-8 guarantees. However, when converting between string types, there should be no additional checks (e.g. when moving the String into a Cow<'a, str>).

[...] Did you try #[serde(try_from = "&'a [u8]")]? I was not able to compile your branch with my suggestion, but it works in playground.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

HenningHolmDE commented 2 weeks ago

@duesee I'm a bit lost with the Coveralls failure.

In the report, there seem to be arbitrarily sprinkled coverage misses in empty lines and comments. I first thought that might have something to do with mixing in old coverage information into the report, but even with the changes to the coverage report generation we merged today, this is still the case.

Also, when I look at the coverage reports generated locally, the PR seems to only improve things in the files critizied by the tool and the report generated by grcov makes actual sense.

For the base report of main, everything seems to be fine and comparable to the local report.

Any ideas on this?

jakoschiko commented 2 weeks ago

Hopefully, I only added deserialization through String in cases, where the underlying type is some type of string (e.g. Cow<'a, str>). If you found any occurence where I went through a String on the way to a byte type (e.g. Cow<'a, [u8]>), please let me know. That would definetely be incorrect.

My bad. I thought that types like Atom were wrapping &[u8], not &str.

However, if you are suggesting that previously, data was not copied, this seems not to be the case as deserialization into a Cow always results in a Cow::Owned, thus copying the data.

The previous serialization was also non-optimal (probably by accident). It should be possible to have an implementation that does not copy the bytes. E.g. the TryFrom implementation for Atom is using Cow::Borrowed:

impl<'a> TryFrom<&'a str> for Atom<'a> {
    type Error = ValidationError;

    fn try_from(value: &'a str) -> Result<Self, Self::Error> {
        Self::validate(value)?;

        Ok(Self(Cow::Borrowed(value)))
    }
}

There is probably some way of doing this, but would suggest adding an issue to tackle this optimization at a later point in time. What do you think?

Sure, if it's non-trivial to fix it, let's do it later.

Where would you expect additional unicode checks? On the way to the first Rust string type in the conversion, there will always be a unicode check to ensure Rust's UTF-8 guarantees.

Again, my bad. I'm a little bit surprised that @duesee decided to use &str instead of &[u8]. I far as I know we don't need to uphold Rust's UTF-8 guarantees here.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

This confuses me. Because of the TryFrom implementation I mentioned above I think this should be possible without much effort.

HenningHolmDE commented 2 weeks ago

I'm a little bit surprised that @duesee decided to use &str instead of &[u8]. I far as I know we don't need to uphold Rust's UTF-8 guarantees here.

When writing the previous comment, I gave this some more thought and I'm actually also not sure if assuming this is correct. I have created #506 for discussing this further.

I tried using &'a str in combination with #[serde(borrow)], but as stated above, unfortunately, I was not able to easily get rid of any copying this way in the actual code.

This confuses me. Because of the TryFrom implementation I mentioned above I think this should be possible without much effort.

Unfortunately, your playground example simplifies things a bit by not using Cow and therefor not requiring #[serde(borrow)]. However, I'm not sure what I observed a few days ago, but trying it out now in the playground, the combination indeed seems to work. Anyways, there is still the point of having to add #[serde(borrow)] all the way up the type tree for which I'm not sure of the consequences. So I would rather not throw this improvement into this PR as well and have created #507 to keep track of the topic.

duesee commented 2 weeks ago

Regarding the coverage: Maybe, maaaybe, we screwed up the GitHub Actions cache when porting everything to just a few months ago. I had this feeling already that something is flaky... If your local coverage is sane, I would suggest not blocking on this. I'll open an issue to take a closer look.

HenningHolmDE commented 2 weeks ago

Regarding the coverage: Maybe, maaaybe, we screwed up the GitHub Actions cache when porting everything to just a few months ago. I had this feeling already that something is flaky... If your local coverage is sane, I would suggest not blocking on this. I'll open an issue to take a closer look.

511 should fix that. I am happy to wait for that one to be merged first and see this PR not reducing the coverage after rebasing.

coveralls commented 2 weeks ago

Pull Request Test Coverage Report for Build 9568958583

Details


Totals Coverage Status
Change from base Build 9568868462: 0.4%
Covered Lines: 10190
Relevant Lines: 10995

💛 - Coveralls
duesee commented 2 weeks ago

Awesome work! Thanks! :-)