basho / basho_docs

Basho Products Documentation
http://docs.basho.com
Other
169 stars 191 forks source link

Document required encoding of query parameters of search #2515

Open lucafavatella opened 7 years ago

lucafavatella commented 7 years ago

Solr

A note in the documented changes of Solr 4.1.0 regarding portability of Solr across Web containers points out that "Query strings passed in via the URL need to be properly-%-escaped, UTF-8 encoded bytes, otherwise Solr refuses to handle the request". A note in the documented changes of Solr 4.5.0 mentions parametrization of encoding of query parameters by ie parameter (e.g. ie=iso-8859-1), parametrization of encoding of POST request body by Content-Type header (e.g. application/x-www-form-urlencoded; charset=iso-8859-1), and UTF-8 as the default encoding. As of Solr 4.10.4 UTF-8 is still the default encoding for both query parameters and POST request body.

Riak Search

The version of yokozuna in riak kv 2.2.3 is 2.1.10 that integrates Solr 4.10.4 (see also https://github.com/basho/yokozuna/pull/709/commits/7f0d464b9190ee6db115aa4bfcd38f6407791e4a) whose documentation is available online.

Yokozuna 2.1.10 depends on riak_kv 2.1.7 that via riak_api 2.1.6 depends on basho/webmachine 1.10.8-basho1 that contains e.g. module wrq, and that depends on mochiweb v2.9.0p2 that contains e.g. module mochiweb_util.

When receiving a search request, yokozuna calls the search function, that extracts the query - percent-decoded but not further decoded e.g. Unicode - then appends some distributed search related parameters then percent-encodes (not further e.g. Unicode) the parameters and contacts Solr via POST request setting header content type to application/x-www-form-urlencoded.

As such content type header has no charset specified, Solr interprets the POST body as UTF-8.