permutation method for uri matching fails for uris with many query params

subakva commented 15 years ago

I ran into an issue tonight with version 1.2.6 of fakeweb when testing a request with 12 query params. One of my features was hanging, and I traced it back to the new method for uri matching in registry.rb. This code generates a list of possible matching URIs by generating the permutations of all the query params. This probably works very well when there are only a few params, but beyond that, the number of possibilities approaches infinity a little too quickly for my taste.

I'm not sure what the affect of rolling back the change would have on Regexp support or performance, but perhaps sorting the keys or doing a key-by-key check would be better for larger arrays.

subakva commented 15 years ago

ddollar has a fork that removes this permutation feature.

chrisk commented 15 years ago

Ah, good catch. That'd be a little much with 12 query params.

The problem I had with the old implementation is that, unlike String-based registrations, we can't assume that the regex has a "query string" part that can be split out and sorted by key for easy iterative matching. For example, a regex might look something like /example\.com\?[z\d]{2}=1&(?:opt1=a|opt2=b)$/, and match the URI http://example.com?9z=1&opt2=b.

We were getting around this in 1.2.5 by forcing the user to "sort" the params in the regex, such that when the request query has its params sorted, it'd match. The problem is that for a regex like the one above, the lexical order of the keys isn't necessarily clear: maybe that first part is 00, or maybe it's zz, which means it could sort on either side of the second opt key. Furthermore, it seemed to violate a lot of users' Principles of Least Surprise that they'd have to sort things in their regexes... I got a ton of messages on GitHub about that.

Making things even more complicated, most users tend to think of a request's query params as unordered, such that http://example.com/?a=1&b=2 and http://example.com/?b=2&a=1 are considered functionally equivalent. (Probably because most web apps work that way -- also, Net::HTTP's optional use of a hash to pass query params means that the order in which servers receive them can be unpredictable in the first place, since hashes are unordered in Ruby 1.8.)

So, a couple ideas:

maybe there are heuristics and/or less expensive techniques we can use to reduce the "search space" from big numbers like 12! to something more reasonable?
it might be better to avoid the string gymnastics altogether by having users specify query params separately from the rest of the URI during registration, perhaps as a hash. We could allow regexes to optionally be used for both the keys and values, but I guess this would force the common a=1&b=2&... structure.

I'll think about those some more I guess :) In the meantime, if anybody has a patch/fork that both helps with this bug and satisfies the test case with the regex and request I mentioned above, definitely send it along! Otherwise, I'd probably recommend just sticking with 1.2.5, since 1.2.6 was a pretty minor release otherwise. Here's the changelog.

blaine commented 14 years ago

I made a fork that fixes this for me, but didn't look at the issues here until after I was done. :-)

I'd make the argument that regular expressions should match exactly, rather than being pseudo-regexes. Probably the right solution here is to build a URIRegex class that acts like a regex except when it's asked to match a URI, in which case it intelligently deconstructs itself and the URI and does a comparison that way; this would mean that the expensive permutation logic would only get called when absolutely necessary.

mbleigh commented 14 years ago

I would vote for making query parameters registered via hashes instead of as part of the URI...I would find that to be much more readable anyway and since it would solve this problem as well it seems to me to be a win-win.

chrisk commented 14 years ago

Hey, Blaine's in the house. :)

Sorry for sitting on this one for so long, folks. Too busy at work lately. I just pushed a failing test for this, and I'll figure out a short-term resolution next so we can release a fixed 1.2.7.

Blaine, what do you mean by "regular expressions should match exactly, rather than being pseudo-regexes"? You mean the regexes should have to be anchored or something?

blaine commented 14 years ago

chrisk, just that in general I expect a regex like /a=1&b=2/ to match "somestuff?a=1&b=2&other=y", but not "somestuff?other=y&b=2&a=1", without some kind of "out-of-order" modifier.

For example, /abc/ doesn't match "ABC", but /abc/i does. Ideally there would be a 'Regex' class that matched on the existence of query parameters; that could take a simpler approach than combinatorics by just [intelligently] parsing the 'URI Regex' as a URI, and pulling out the query parameters.

For example: URIRegex.new('/path/to?a=1&b=2') would match '/path/to?b=2&a=1' or '/path/to?a=1&b=2', but %r{/path/to?a=1&b=2} only matches '/path/to?a=1&b=2'

thoughts?

chrisk commented 14 years ago

Revert to sorting the request's query params before matching against registered regexps, instead of trying every possible combination. Closed by 93b3f670c1ff7e89427aca24288373f64c271558.

The previous version made matching regexps easier, but the number of permutations required grows too quickly: it's O(n!) for n params, which is so many that FakeWeb would seem to hang for n > 6 or so.

chrisk commented 14 years ago

OK, I reverted the feature in the above commit. I'll try to get out a 1.2.7 release soon.

I like the idea of using a hash to match params... Either way, I think I'll open another ticket for the enhancement, and we can figure out what to do there.

chrisk commented 14 years ago

OK, the new issue is #7.

chrisk / fakeweb

permutation method for uri matching fails for uris with many query params #5