Closed rdewaele closed 4 years ago
Hm, interesting. I'm surprised/ashamed that this isn't covered by the test suite, but I guess that doesn't really catch string buffer issues since it's all file based.
I'll look into it, thanks.
Oh, I see. There are test cases for this and a bunch of similar things, but they don't check the output, only that serdi bails with an error.
Okay, your fix is good given what was currently there (thanks!), but I looked a bit deeper into it and realized the character and EOF handling was pretty deeply wrong (the mix of types and the "sometimes null is EOF sometimes not" smells are symptoms of this which have been bothering me for a while).
So, I rewrote a bunch of that stuff so that EOF can be handled more sanely in general. https://github.com/drobilla/serd/commit/f7ffff1e75634909da60ea63a7c52f1a001220b8 should fix the same problem, also in some other places.
That said, there's still no test for reading string input out of bounds (I'm not sure how to do that without adding code to serdi
specifically for it), so let me know if you're still seeing that.
Thank you for looking at & solving this so swiftly, and providing a more fundamental solution. I'll rework my (internal) conan package to make a gitref-based release and let you know how this is working out for us. (Not that I expect further issues.)
Great, thanks / you're welcome. As it happens I spent a bunch of time recently almost completely automating the release process, so I can kick out another one if that makes life easy (though testing beforehand is of course appreciated).
Not that I expect further issues
Hey, you never know. Serd serves as a constant reminder to me that no matter how thoroughly you test something, there's always bugs somewhere :) It's written mainly for the high-throughput file parsing scenario so reading an exact string like this with invalid input could still be flaky (relatively speaking). I have some ideas how to tackle that more comprehensively though.
I have updated the serd version we use to a76b091. Valgrind likes all the tests a lot better now. I did notice a change in behavior when passing the empty string (""
) to serd_reader_read_string
:
I still have to verify the suspected cause of the latter, as we could simply work around these changes by not passing the empty string to serd in the first place, saving some compute time in the process. :-)
edit: these behaviors do not occur with serdi
I consider this proposed change more of a proof-of-concept fix, as I am not very familiar with the serd codebase. Its main purpose is to illustrate the problem by showing one way of preventing it from happening.
When parsing an unterminated string literal in a turtle/trig/turtle file, serd will perform invalid reads beyond the string buffer end. For example with the following input file:
The output is something as follows:
In the context I am using serd, I am passing a nul-terminated string of the exact size of the input to
serd_reader_read_string
. In this use-case theread_head
surpasses the length of the string buffer and performs invalid reads. With serdi, it does not seem to perform invalid reads, instead producing the presumably undesired output of many nul characters.