gobwas / glob

Go glob
MIT License
948 stars 63 forks source link

Enters infinite loop when parsing expressions with non-ASCII characters #15

Closed calmh closed 8 years ago

calmh commented 8 years ago

At least the asterisk+non-ASCII combo seems to trigger this. For example, the diff

diff --git a/vendor/github.com/gobwas/glob/glob_test.go b/vendor/github.com/gobwas/glob/glob_test.go
index 6fe73a6..706e2dd 100644
--- a/vendor/github.com/gobwas/glob/glob_test.go
+++ b/vendor/github.com/gobwas/glob/glob_test.go
@@ -64,6 +64,7 @@ func TestGlob(t *testing.T) {
        for _, test := range []test{
                glob(true, "* ?at * eyes", "my cat has very bright eyes"),

+               glob(true, "*ä", "åä"),
                glob(true, "abc", "abc"),
                glob(true, "a*c", "abc"),
                glob(true, "a*c", "a12345c"),

will cause the test suite to never complete and attempt to allocate infinite memory in the process.

calmh commented 8 years ago

The problem appears to be that the stateFn does a read(), sees a char_any, does a lookahead() which in turn does read() and unread(), and then finally does an unread() before returning the state lexAny. There is a nested read(), read(), unread(), unread() sequence here but those functions can't be nested like that since unread() depends on l.width being set by the preceding read(). l.width becomes 2 at the second read() due to the wide character, so the second unread() rewinds all the way to -1 and we start the entire pattern over again.

It seems l.width needs to be a stack, or nested calls to read/unread can never be made, or some other mechanism must be used to track it.

This is above my paygrade to fix. :)

gobwas commented 8 years ago

Hey @calmh!

Thanks you for such deep digging the problem and reporting this. I've reimplemented lexer in a little bit simpler (as seems to me) form.

It is in a newlexer branch. It would be cool if you could easy check the glob from this branch. Anyway I have added test case from the diff above and it passes the tests.

In few days I will merge it in master, if there no bugs or some questions.

🍻

calmh commented 8 years ago

Thanks, looks good to me. I'll adopt that branch for the time being and report if there is anything weird.

gobwas commented 8 years ago

Thanks!

gobwas commented 8 years ago

Hi @calmh! Just finished refactoring. It is in master now. If you fill uncomfortable with the new version, I've tagged previous (with just refactoring of lexer) with 0.1.0. And current is 0.2.0.

Thanks!