Closed DaveLak closed 5 months ago
On a related note, do the maintainers of this project have any interest in bringing the fuzzing test code into this repo? I searched the repository for prior discussions about it but didn't come across anything.
I'm happy to:
Thanks a lot for this wonderfully complete bug report, it's much appreciated.
Personally, I think the config-parser implementation here is probably very broken and non-conforming, and the fuzzer will have a very good time with it. A problem I see is that reproductions are usually hidden behind a login, so it requires extra work to make these available here so the community has a chance to fix them. However, I am open to give it a shot and see how it goes.
I definitely welcome you to help with maintaining the fuzzer integration, and would appreciate if that would also entail the maintenance of the issues it finds such that all information for reproduction would be contained within them so it's not only maintainers who can fix them.
Thanks for the prompt reply 🙂
Silly me for not realizing earlier, but I see gitoxide
has been integrated for a while with OSS-Fuzz issues mirrored in GH issues and some good results. Given that context, and this project's maintenance mode status I definitely hear where your coming from.
I think it probably is worth opening up a discussion thread to continue the fuzzing convo for the sake of keeping this issue on topic as I do have some follow-up questions for you related to what you think a fix for this might look like (if any). I went ahead and replied to you here: https://github.com/gitpython-developers/GitPython/discussions/1889#discussion-6444106
I started to take a closer look at this and now have a better idea of what's going on here as well as a proof-of-concept fix, but I wanted to get your thoughts on this @Byron, before I put it into PR. CC @EliahKagan in case you have thoughts here as well.
Initially, my instinct was to refer to the git-scm.com docs on Git config syntax and use that to inform the fix here, but (ignoring the fact that there isn't much guidance on value formats relevant to this issue) this portion of @Byron's comment above caught my attention and made re-think things a bit:
I think the config-parser implementation here is probably very broken and non-conforming
Specifically, the "non-conforming" part caught my eye. If GitConfigParser
's behavior is already known (and to some degree expected) to be non-conforming to canonical Git's behavior, then I don't think I agree with my initial assessment of the "Expected Behavior" that I described in the PR description (i.e., this should throw an exception) regardless of how canonical Git would handle it. Introducing a "fix" that throws an exception seems unnecessarily disruptive to users that may have otherwise working code that was written around this "non-conforming implementation". Preserving the existing behavior and just making it safer around exception cases seems like a better approach.
So instead, I think a more considerate and generally simpler fix would be to maintain the existing parsing behavior, and just make its checks a little more robust to eliminate the risk of the unhandled IndexError
described in this issue.
The diff below shows the change I am considering which would:
if
check in the string_decode
function to safely handle empty strings and prevent index errors when accessing string elements (this resolves the root cause of this issue.)_read
method to more robustly handle quoted values, ensuring correct handling even when quotes are not closed properly or when the line continues beyond a quoted segment.Here's the diff:
diff --git a/git/config.py b/git/config.py
index 3ce9b123..91b69531 100644
--- a/git/config.py
+++ b/git/config.py
@@ -452,7 +452,7 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
e = None # None, or an exception.
def string_decode(v: str) -> str:
- if v[-1] == "\\":
+ if v and v[-1] == "\\":
v = v[:-1]
# END cut trailing escapes to prevent decode error
@@ -508,6 +508,8 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
if len(optval) > 1 and optval[0] == '"' and optval[-1] != '"':
is_multi_line = True
optval = string_decode(optval[1:])
+ elif len(optval) > 1 and optval[0] == '"' and optval[-1] == '"':
+ optval = string_decode(optval[1:-1]) # safely decode fully quoted value
# END handle multi-line
# Preserves multiple values for duplicate optnames.
cursect.add(optname, optval)
@@ -524,8 +526,10 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
is_multi_line = False
line = line[:-1]
# END handle quotations
+ if line: # Ensure line is not empty before decoding
+ line = string_decode(line)
optval = cursect.getlast(optname)
- cursect.setlast(optname, optval + string_decode(line))
+ cursect.setlast(optname, optval + line)
# END parse section or option
# END while reading
What do you think? Does this sound like a reasonable approach here, or is there some context I might be missing?
If this does sound like something worth introducing, let me know and:
fuzz_config.py
test so it's covered there as well.So instead, I think a more considerate and generally simpler fix would be to maintain the existing parsing behavior, and just make its checks a little more robust to eliminate the risk of the unhandled
IndexError
described in this issue.
The above hits home, and perfectly describes how I imagine handling this. It's an improvement, without going overboard and try to fix something that is systematically broken - a conforming parser can't be made based on an INI parser.
If this does sound like something worth introducing, let me know and:
- I'll add a Unit test.
- Remove the workaround for this error in the
fuzz_config.py
test so it's covered there as well.- Open a PR.
This sounds like a plan! Thanks again.
PR with the fix: #1908
Only the changes to string_decode
were necessary, so that's all I changed.
Hi, I noticed the fuzzing tests that OSS-Fuzz runs on this project are broken and while I was working on fixing them I believe I came across a minor bug:
The Bug
An
Uncaught Python exception:
IndexError: string index out of range
can be triggered if when trying to call.read()
onGitConfigParser
if it was initialized with a malformed config file.Current Behavior
It's easiest to demonstrate, so please consider this example:
Assuming that code is in
/path/to/example/config_indexerror_reproduction.py
thenproduces something akin to:
My Reproduction Environment Details
The reproduction code above was tested on: ```sh Python Version: 3.12.1 (main, Feb 5 2024, 16:23:00) [Clang 15.0.0 (clang-1500.1.0.2.5)] OS Information: macOS-14.4.1-x86_64-i386-64bit Installed Packages: Package Version --------- ------- gitdb 4.0.11 GitPython 3.1.42 pip 24.0 smmap 5.0.1 ``` And the fuzzer environment was: ``` Python Version: 3.8.3 (default, Mar 17 2024, 03:21:27) [Clang 15.0.0 (https://github.com/llvm/llvm-project.git bf7f8d6fa6f460bf0a16ffe OS Information: Linux-6.6.16-linuxkit-x86_64-with-glibc2.2.5 Installed Packages: Package Version ------------------------- ------- altgraph 0.17.4 atheris 2.3.0 coverage 6.3.2 importlib_metadata 7.0.2 gitdb 4.0.11 GitPython 3.1.42 pip 24.0 smmap 5.0.1 packaging 24.0 pyinstaller 5.0.1 setuptools 41.0.1 six 1.15.0 zipp 3.18.1 ```So, if I'm reading the source correct, it seems like the combination of some header section (
[-]
above) followed by a key/value assignment that has a value consisting of a double quoted string with a new line inside it confuses the check here which strips the"
on the new line: https://github.com/gitpython-developers/GitPython/blob/c7675d2cedcd737f20359a4a786e213510452413/git/config.py#L521-L528and passes an empty string to
string_decode
which isn't expecting that when it indexes into it's arg: https://github.com/gitpython-developers/GitPython/blob/c7675d2cedcd737f20359a4a786e213510452413/git/config.py#L454-L455Expected Behavior
I'd expect an explicitly raised
ParsingError
similar to how it's handled a little further up: https://github.com/gitpython-developers/GitPython/blob/c7675d2cedcd737f20359a4a786e213510452413/git/config.py#L514-L518