ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Broken unicode support for dataset search #502

Closed jan-imrich closed 1 year ago

jan-imrich commented 2 years ago

We're using CKAN 2.8.10 (legacy python) in Docker. Latest release of ckanext-harvest. Searching for datasets by tag - for example /dataset?tags=abcčd yields server error:

Error - <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\u010d' in position 9: ordinal not in range(128)
URL: http://.../dataset?tags=abc%C4%8Dd
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/weberror/errormiddleware.py', line 171 in __call__
  app_iter = self.application(environ, sr_checker)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/dec.py', line 147 in __call__
  resp = self.call_func(req, *args, **self.kwargs)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/dec.py', line 208 in call_func
  return self.func(req, *args, **kwargs)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/fanstatic/publisher.py', line 234 in __call__
  return request.get_response(self.app)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/request.py', line 1053 in get_response
  application, catch_exc_info=False)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/request.py', line 1022 in call_application
  app_iter = application(self.environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/dec.py', line 147 in __call__
  resp = self.call_func(req, *args, **self.kwargs)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/dec.py', line 208 in call_func
  return self.func(req, *args, **kwargs)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/fanstatic/injector.py', line 54 in __call__
  response = request.get_response(self.app)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/request.py', line 1053 in get_response
  application, catch_exc_info=False)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/webob/request.py', line 1022 in call_application
  app_iter = application(self.environ, start_response)
File '/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/pylons_app.py', line 264 in inner
  result = application(environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/beaker/middleware.py', line 73 in __call__
  return self.app(environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/beaker/middleware.py', line 156 in __call__
  return self.wrap_app(environ, session_start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/common_middleware.py', line 33 in __call__
  return self.app(environ, start_response)
File '/usr/lib/ckan/venv/src/ckan/ckan/config/middleware/common_middleware.py', line 59 in __call__
  return self.app(environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 125 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 324 in dispatch
  return controller(environ, start_response)
File '/usr/lib/ckan/venv/src/ckan/ckan/lib/base.py', line 242 in __call__
  res = WSGIController.__call__(self, environ, start_response)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 221 in __call__
  response = self._dispatch_call()
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 172 in _dispatch_call
  response = self._inspect_call(func)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 107 in _inspect_call
  result = self._perform_call(func, args)
File '/usr/lib/ckan/venv/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 60 in _perform_call
  return func(**args)
File '/usr/lib/ckan/venv/src/ckan/ckan/controllers/package.py', line 287 in search
  query = get_action('package_search')(context, data_dict)
File '/usr/lib/ckan/venv/src/ckan/ckan/logic/__init__.py', line 498 in wrapped
  result = _action(context, data_dict, **kw)
File '/usr/lib/ckan/venv/src/ckan/ckan/logic/action/get.py', line 1844 in package_search
  data_dict = item.before_search(data_dict)
File '/usr/lib/ckan/venv/src/ckanext-harvest/ckanext/harvest/plugin/__init__.py', line 75 in before_search
  return self.before_dataset_search(search_params)
File '/usr/lib/ckan/venv/src/ckanext-harvest/ckanext/harvest/plugin/__init__.py', line 113 in before_dataset_search
  fq = "{0} -dataset_type:harvest".format(search_params.get("fq", ""))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u010d' in position 9: ordinal not in range(128)

I believe this exact change of this commit is the cause of the issue on legacy python: Merge pull request https://github.com/ckan/ckanext-harvest/pull/492 from salsadigitalauorg/add-support-for-CKAN-2.10