chinedufn / psd

A Rust API for parsing and working with PSD files.
https://chinedufn.github.io/psd
Apache License 2.0
265 stars 40 forks source link

Don't panic when reading an out of bounds PsdCursor range #7

Closed paolobarbolini closed 4 years ago

chinedufn commented 4 years ago

Hmm - it looks like this would add some overhead to a pretty hot code path. I'm not sure how much - we don't have any benchmarks currently.

What lead to this PR? i.e. What's the motivation behind this change?

Ideally if a PSD file is valid we don't ever try to overread - but there could be a bug(s).

Thanks!

paolobarbolini commented 4 years ago

I think this crate should be as strong as possible to user provided input for it to be used on a server.

I'm using this on a server to extract thumbnails from user uploaded files, and I encountered this problem after erroneously passing a png to this crate. The same issue could have occurred if the user uploaded a corrupt PSD file or one where part of it got trimmed off for some reason.

I'm not sure what the overhead of this would be, but I think it wouldn't be as bad as you think. Rust is already checking to make sure we are not reading memory we shouldn't have access to, and those methods were already returning a Result so callers to it are already matching on the error.

chinedufn commented 4 years ago

Yeah I could totally see the impact being negligible - but I'd heavily prefer to have benchmarks to verify performance against instead of assuming. Other than that yup I'm totally with you!

I'm surprised that you didn't run into this error when passing in a PNG. Any idea why you wouldn't have?

https://github.com/chinedufn/psd/blob/4c0d9d9bd5f262e92c02deee2c0713c0db04e1a1/src/sections/file_header_section.rs#L81-L85


So one thing that comes to mind for me is that the PSD spec gives us a lot of information about how many bytes we should expect to see.

Here's just one of many examples:

https://github.com/chinedufn/psd/blob/4c0d9d9bd5f262e92c02deee2c0713c0db04e1a1/src/sections/layer_and_mask_information_section/mod.rs#L64-L71

So my first thought here is that we should design the API to not even be able to get to the point where reading bytes from the cursor can crash. We should be returning much more contextual errors before then. Such as "Hey! This section of color info should have 30,000 bytes because the header said so, yet it has ..."

The same issue could have occurred if the user uploaded a corrupt PSD file or one where part of it got trimmed off for some reason

That's a great example. One thing I'm thinking is that at the end of the day a PSD has five sections

https://github.com/chinedufn/psd/blob/4c0d9d9bd5f262e92c02deee2c0713c0db04e1a1/src/sections/mod.rs#L11-L35

Each of which has specs on what to expect in it.

So we should be able to catch and return more contextual errors higher in the chain in all five of these sections based on the PSD specification.

I think that any changes to the parsing should contain a test file and test case (even if it's an intentionally broken one) so that we can handle the kinds of issues you're mentioning in the best way while still having the flexibility to refactor the code without fear.

OR unit tests for the individual section parser sections might be good too...

i.e. https://github.com/chinedufn/psd/tree/master/tests/fixtures

and https://github.com/chinedufn/psd/blob/4c0d9d9bd5f262e92c02deee2c0713c0db04e1a1/tests/channels.rs#L8

But dipping all the way to the bottom and returning cursor errors seems to me like the wrong place to add our resiliency. Since that API is private I think we can ensure correctness without compromising at all on speed by checking remaining bytes only when necessary higher up in the parser.


These are just my overall thoughts. Since we haven't worked together before I want to stress that I'm very open to your thoughts and certainly super open to changing my thoughts based on what you think!

So yeah - please let me know what ya think about my perspective on this!

paolobarbolini commented 4 years ago

I totally agree with you. I plan to work on this in the next weeks and start submitting PRs with specific fixes to cases where a file caused it to crash.

ATM I'm not sure why this didn't catch the PNG, the file is totally valid. I'll work on this first since it will stop non PSD files reaching the rest of the API. https://github.com/chinedufn/psd/blob/4c0d9d9bd5f262e92c02deee2c0713c0db04e1a1/src/sections/file_header_section.rs#L81-L85

chinedufn commented 4 years ago

Sounds awesome! If there's anything I can do in terms of pointing you towards different things in the codebase let me know. Excited for your PR(s)!!