Jarn / collective.solr

9 stars 1 forks source link

Editing search_pattern through the control panel results in potential UnicodeDecodeErrors on search #10

Closed mjpieters closed 12 years ago

mjpieters commented 12 years ago

The search_pattern parameter is used to turn a Plone search form query into a SOLR query; the (byte string, encoded to utf8) search is interpolated into this parameter to be sent to SOLR.

This goes great if the search_pattern parameter is set through portal_setup; the parameter is then set as a python string (so encoded, not a unicode string). However, when you edit the parameter through the control panel form, the value is stored as a unicode string.

Once this has happened, all non-ascii searches will fail with a UnicodeDecodeError, as Python will try and decode the utf-8 search byte string with ASCII to form a unicode string to interpolate into the unicode search_pattern parameter.

The error takes place in collective.solr.mangler:

# ... start of TB elided ...
  Module collective.indexing.monkey, line 84, in searchResults
  Module collective.solr.monkey, line 31, in searchResults
  Module collective.solr.dispatcher, line 42, in __call__
  Module collective.solr.dispatcher, line 86, in solrSearchResults
  Module collective.solr.mangler, line 90, in mangleQuery
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17: ordinal not in range(128)

We need to either that both query and parameter are byte strings or unicode, not one or the other.

mjpieters commented 12 years ago

It looks as if this was already fixed for 3.0b5 in https://github.com/Jarn/collective.solr/commit/ddced6f9cb875032f500ef093aed4c1820658cbe