ctdk / goiardi

A Chef server written in Go, able to run entirely in memory, with optional persistence with saving the in-memory data to disk or using MySQL or Postgres as the data storage backend. Docs: http://goiardi.readthedocs.io/en/latest/index.html
http://goiardi.gl
Apache License 2.0
280 stars 39 forks source link

unescape query terms when searching FileIndex #49

Closed ickymettle closed 7 years ago

ickymettle commented 7 years ago

This patch fixes a little buglet I just stumbled across. Query terms containing escaped special characters are searched on literally without any unescaping.

This patch unescapes query terms prior to search.

ctdk commented 7 years ago

I will definitely check this out. I recall that the whole business with the escaped special characters is a little weird, to duplicate the behavior Chef server has with its Solr search. At one point at least the fully escaped items were stored just like that (so "foo\[bar\]" would be exactly that in the trie).

However, it's entirely possible that when the postgres search was added that got messed up, and chef-pedant failed to catch it. I need to tread a little carefully there, just to make sure it doesn't break it some other way (or to fix another part to make it more correct all around).

Thanks, either way, for spotting this. Hopefully I'll have it turned around soon.

ctdk commented 7 years ago

One question for you @ickymettle: how did you trigger this as a problem? I ask because with the in-memory trie search, the various keys and values are actually stored in their escaped form so unescaping the queries breaks the searches. (I did this way back when to save the effort of unescaping the Solr queries after disassembling them.)

I'm not opposed to doing the right thing and changing the way they're stored in the trie so that those characters aren't escaped, but before I do that I figured I should know exactly how you triggered the buglet mentioned.

ickymettle commented 7 years ago

Sure thing. So in my particular case we have a data bag that contains bunch of items, each item contains a MAC address, some code exists that searches the data bag for a particular MAC address.

Eg:

{ 
  "id": "a_thing",
  "mac": "11:22:33:44:55:66",
  "foo": "bar"
}

If you create a data bag on a regular chef server (lets call it widgets), load that item, then a knife search query such as:

knife search widgets 'mac:11\:22\:33\:44\:55\:66'

It will correctly return the a_thing item. Doing the same against goiardi fails. I caught this building some integration test backends that utilize goiardi.

From my digging through the goiardi code the MAC field of the data bag item is stored internally as provided in the data bag (unescaped), when the plain text query is executed as an exactTerm match the query term still contains the literal \ chararacters so 11:22:33:44:55:66 doesn't match 11\:22\:33\:44\:55\:66 and the query fails.

I'm not too familiar with the trie code or the parser, so my potential fix may be possibly masking another problem there with the SOLR query disassembly?

Hope that helps, thanks for following up.

ctdk commented 7 years ago

Ah, interesting. I think I'm picturing what the problem probably is, so one way or another I should be able to get it fixed soon. It's a bit curious that the chef-pedant tests didn't catch it, but this wouldn't be the first time that's happened.

ctdk commented 7 years ago

Thanks @ickymettle, I added a little bit more to this, merged it into master, and cut a new release. Don't forget to rebuild your index with knife index rebuild.