ayleph / mediagoblin-basicsearch

Search plugin for Gnu MediaGoblin
GNU Affero General Public License v3.0
9 stars 6 forks source link

Server error when search query includes non-ASCII characters #3

Open ayleph opened 8 years ago

ayleph commented 8 years ago
Error - <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)
URL: https://goblinrefuge.com/mediagoblin/search/?query=espa%C3%B1a
File '/path/to/mediagoblin/lib/python2.7/site-packages/Paste-1.7.5.1-py2.7.egg/paste/exceptions/errormiddleware.py', line 144 in __call__
  app_iter = self.application(environ, sr_checker)
File '/path/to/mediagoblin/mediagoblin/app.py', line 342 in __call__
  return self.call_backend(environ, start_response)
File '/path/to/mediagoblin/lib/python2.7/site-packages/Werkzeug-0.10.1-py2.7.egg/werkzeug/wsgi.py', line 591 in __call__
  return self.app(environ, start_response)
File '/path/to/mediagoblin/mediagoblin/app.py', line 276 in call_backend
  return self._finish_call_backend(request, environ, start_response)
File '/path/to/mediagoblin/mediagoblin/app.py', line 318 in _finish_call_backend
  response = controller(request)
File '/path/to/mediagoblin/mediagoblin/decorators.py', line 170 in wrapper
  return controller(request, page=page, *args, **kwargs)
File '/path/to/mediagoblin/mediagoblin/plugins/basicsearch/views.py', line 69 in search_results_view
  'form': form})
File '/path/to/mediagoblin/mediagoblin/tools/response.py', line 36 in render_to_response
  render_template(request, template, context),
File '/path/to/mediagoblin/mediagoblin/tools/template.py', line 144 in render_template
  rendered = template.render(context)
File '/path/to/mediagoblin/lib/python2.7/site-packages/Jinja2-2.7.3-py2.7.egg/jinja2/environment.py', line 969 in render
  return self.environment.handle_exception(exc_info, True)
File '/path/to/mediagoblin/lib/python2.7/site-packages/Jinja2-2.7.3-py2.7.egg/jinja2/environment.py', line 742 in handle_exception
  reraise(exc_type, exc_value, tb)
File '/path/to/mediagoblin/mediagoblin/plugins/basicsearch/templates/mediagoblin/plugins/basicsearch/results.html', line 21 in top-level template code 
  {% from "mediagoblin/utils/object_gallery.html" import object_gallery %}
File '/var/lib/mediagoblin/templates/mediagoblin/base.html', line 67 in top-level template code 
  {% block mediagoblin_body %}
File '/var/lib/mediagoblin/templates/mediagoblin/base.html', line 207 in block "mediagoblin_body"
  {% block mediagoblin_content %}
File '/path/to/mediagoblin/mediagoblin/plugins/basicsearch/templates/mediagoblin/plugins/basicsearch/results.html', line 33 in block "mediagoblin_content"
  {{ object_gallery(request, media_entries, pagination) }}
File '/path/to/mediagoblin/mediagoblin/templates/mediagoblin/utils/object_gallery.html', line 68 in template
  {{ render_pagination(request, pagination) }}
File '/path/to/mediagoblin/mediagoblin/templates/mediagoblin/utils/pagination.html', line 42 in template
  {% set next_url = pagination.get_page_url_explicit(
File '/path/to/mediagoblin/mediagoblin/tools/pagination.py', line 110 in get_page_url_explicit
  base_url, urllib.urlencode(new_get_params))
File '/usr/lib64/python2.7/urllib.py', line 1347 in urlencode
  v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)
vanyasem commented 6 years ago

Can't reproduce this in 2018. Consider closing the issue Running MediaGoblin 0.9.0 with Python 2.7.12 on Ubuntu Server 16.04

2018-01-24-171209_504x721_scrot

vanyasem commented 6 years ago

Somehow it stopped working now -_- I now get the exception as well using same search queries as a week ago Will investigate further

vanyasem commented 6 years ago

somehow the query I send doesn't really match the received one.

it gets received as б…бƒаЙ

2018-01-28-010814_1465x147_scrot

vanyasem commented 6 years ago

I am completely lost now, as it fails to decode some of my non-unicode inputs, while other non-unicode strings work just fine

vanyasem commented 6 years ago

searching for те doesn't work, while searching for тест does

vanyasem commented 6 years ago

I kinda fixed it with a nasty patch

vanyasem commented 6 years ago

I patched $MEDIAGOBLIN_ROOT/mediagoblin/tools/pagination.py

My diff:

97a98,121                                                                                                                                                                                                                                                                      
>     def encode_obj(self, in_obj):                                                                                                                                                                                                                                            
>         def encode_list(in_list):                                                                                                                                                                                                                                            
>             out_list = []                                                                                                                                                                                                                                                    
>             for el in in_list:                                                                                                                                                                                                                                               
>                 out_list.append(self.encode_obj(el))                                                                                                                                                                                                                         
>             return out_list                                                                                                                                                                                                                                                  
>                                                                                                                                                                                                                                                                              
>         def encode_dict(in_dict):                                                                                                                                                                                                                                            
>             out_dict = {}                                                                                                                                                                                                                                                    
>             for k, v in in_dict.iteritems():                                                                                                                                                                                                                                 
>                 out_dict[k] = self.encode_obj(v)                                                                                                                                                                                                                             
>             return out_dict
> 
>         if isinstance(in_obj, unicode):
>             return in_obj.encode('utf-8')
>         elif isinstance(in_obj, list):
>             return encode_list(in_obj)
>         elif isinstance(in_obj, tuple):
>             return tuple(encode_list(in_obj))
>         elif isinstance(in_obj, dict):
>             return encode_dict(in_obj)
> 
>         return in_obj
> 
108c132
<             base_url, urllib.parse.urlencode(new_get_params))
---
>             base_url, urllib.parse.urlencode(self.encode_obj(new_get_params)))
Maytrovato commented 6 years ago

Thank you so much @vanyasem it worked for me.

Just to mention it, if you need the query working with or without accents I had to add this on $MEDIAGOBLIN_ROOT/mediagoblin/plugins/basicsearch/views.py but before that you need to install the unaccent extension on Postgres, here's the link

from sqlalchemy.sql.functions import ReturnTypeFromArgs

class unaccent(ReturnTypeFromArgs):
    pass

...
        else:
           for term in terms:
              # Search entries without accents 
              media_entry_statements.append(unaccent(MediaEntry.title).ilike(term))
              media_entry_statements.append(unaccent(MediaEntry.description).ilike(term))
              media_tag_statements.append(unaccent(MediaTag.name).ilike(term))
              # Search entries with accents
              media_entry_statements.append(MediaEntry.title.ilike(term))
              media_entry_statements.append(MediaEntry.description.ilike(term))
              media_tag_statements.append(MediaTag.name.ilike(term))

        matches = MediaEntry.query.filter(
...