DataBrewery / cubes

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
http://cubes.databrewery.org
Other
1.49k stars 314 forks source link

Unicode error when including Chinese characters in cut on slicer server #332

Open devvmh opened 8 years ago

devvmh commented 8 years ago

Summary

Unicode errors! I'm including an error traceback, a git patch that solves the error, and then a subsequent traceback for a second error I couldn't solve. I appreciate any help getting this working!

Details

I'm using cubesviewer to build a number of views of data from a rails application. A lot of the dimensions include Chinese characters.

In particular, I have a cube called "plans" that can be filtered on utm_source (among other dimensions). When I try to (using cubesviewer) select only plans with specific utm_source values that include chinese characters, I get an error.

I'm using cubes 1.1, but I believe the problem persists in stable versions anyways. Here's the relevant line from requirements.txt: git+https://github.com/DataBrewery/cubes.git@6d83f04acb34eeb0b904dd1af0e97eda6621bf6c

I'm filtering on the "imported_at" date property of the cube, and drilling down on the "utm_source" (to build a cubesviewer pie chart). The relevant filter is looking for utm_source values equal to one of "seo", "不知道", or "合作类". I've used these values since they're short; the bug also appears for any combination that includes unicode characters. Here is the traceback of the error from the slicer server logs.

2016-01-07 11:56:29,090 ERROR Exception on /cube/plans/aggregate [GET]
Traceback (most recent call last):
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/decorators.py", line 118, in wrapper
    return f(*args, **kwargs)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/decorators.py", line 167, in wrapper
    retval = f(*args, **kwargs)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/blueprint.py", line 351, in aggregate
    order=g.order)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/browser.py", line 155, in aggregate
    **options)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/sql/browser.py", line 401, in provide_aggregate
    for_summary=True)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/sql/browser.py", line 542, in aggregation_statement
    (",".join([str(cut) for cut in cell.cuts]),
UnicodeEncodeError: 'ascii' codec can't encode characters in position 23-25: ordinal not in range(128)
127.0.0.1 - - [07/Jan/2016 11:56:29] "GET /cube/plans/aggregate?drilldown=utm_source%40default%3Autm_source&cut=utm_source%40default%3Aseo%3B%E4%B8%8D%E7%9F%A5%E9%81%93%3B%E5%90%88%E4%BD%9C%E7%B1%BB%7Cimported_at%40ymd%3A2015%2C1%2C7- HTTP/1.1" 500 -

And the urldecoded url is

/cube/plans/aggregate?drilldown=utm_source@default:utm_source&cut=utm_source@default:seo;不知道;合作类|imported_at@ymd:2015,1,7-

Though I don't think this is the correct solution, I am able to resolve this particular bug (and a related one) by applying these changes to cubes/sql/browser.py:

diff --git a/sql/browser.py b/sql/browser.py
index bea8676..63d80be 100644
--- a/sql/browser.py
+++ b/sql/browser.py
@@ -539,8 +539,8 @@ class SQLBrowser(AggregationBrowser):

         self.logger.debug("prepare aggregation statement. cell: '%s' "
                           "drilldown: '%s' for summary: %s" %
-                          (",".join([str(cut) for cut in cell.cuts]),
-                          drilldown, for_summary))
+                          (",".join([unicode(cut).encode('utf-8') for cut in cell.cuts]),
+                          unicode(drilldown).encode('utf-8'), for_summary))

         # TODO: it is verylikely that the _create_context is not getting all
         # attributes, for example those that aggregate depends on

However, this simply led me to another bug I couldn't fix. Here's the second slicer server traceback:

2016-01-07 12:04:58,499 ERROR Exception on /cube/plans/aggregate [GET]
Traceback (most recent call last):
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/decorators.py", line 118, in wrapper
    return f(*args, **kwargs)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/decorators.py", line 167, in wrapper
    retval = f(*args, **kwargs)
  File "/usr/local/Cellar/python/2.7.10_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/logging.py", line 83, in log_time
    self.log(method, browser, cell, identity, elapsed, **other)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/logging.py", line 97, in log
    record = self._stringify_record(record)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/server/logging.py", line 114, in _stringify_record
    record["cell"] = compat.text_type(cell) if cell is not None else None
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/cells.py", line 457, in __str__
    return string_from_cuts(self.cuts)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/cells.py", line 683, in string_from_cuts
    strings = [compat.to_unicode(cut) for cut in cuts]
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/compat.py", line 58, in to_unicode
    s = str(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 23-25: ordinal not in range(128)
127.0.0.1 - - [07/Jan/2016 12:04:58] "GET /cube/plans/aggregate?drilldown=utm_source%40default%3Autm_source&cut=utm_source%40default%3Aseo%3B%E4%B8%8D%E7%9F%A5%E9%81%93%3B%E5%90%88%E4%BD%9C%E7%B1%BB%7Cimported_at%40ymd%3A2015%2C1%2C7- HTTP/1.1" 500 -
fezeev commented 8 years ago

Faced exactly the same problem with Russian unicode characters. I was carefully check out the source of cubes/compat.py and cubes/cells.py, and notice, that it has completely different code for python 2.7 and 3.0+. I've also has python 2.7, so I decided to try with python 3.4 and it helps. Just creating virtualenv with python 3.4, installing cubes 1.1 in it with all prerequesites and adding 'source env/bin/activate' before starting cube server completely resolve my problem.

devvmh commented 8 years ago

Thanks @fezeev! I'll give that a try next week!

Stiivi commented 8 years ago

Can you please try now with the master?

devvmh commented 8 years ago

I can confirm the same bug exists, with a different error message, using the latest master

  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/sql/browser.py", line 402, in provide_aggregate
    for_summary=True)
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/sql/browser.py", line 542, in aggregation_statement
    (",".join([compat.to_unicode(cut) for cut in cell.cuts]),
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/compat.py", line 58, in to_unicode
    s = str(s)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 23-25: ordinal not in range(128)
127.0.0.1 - - [06/Jun/2016 10:25:56] "GET /cube/plans/aggregate?drilldown=utm_source%40default%3Autm_source&cut=utm_source%40default%3Aseo%3B%E4%B8%8D%E7%9F%A5%E9%81%93%3B%E5%90%88%E4%BD%9C%E7%B1%BB%7Cimported_at%40ymd%3A2014%2C6%2C6- HTTP/1.1" 500 -

Relevant requirements.txt line is

git+https://github.com/DataBrewery/cubes.git@e0181b977d1ffbc6bd81335f0b6355f26b860f13

I am still using Python 2.7.10.

Thanks for working on this!

EDIT: I've done a bit more digging. Commenting out line 58 of compat.py (s = str(s)) gives me an interesting error:

  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/sql/browser.py", line 542, in aggregation_statement
    (",".join([compat.to_unicode(cut) for cut in cell.cuts]),
  File "/Users/devin/olap-cubes/venv/lib/python2.7/site-packages/cubes/compat.py", line 61, in to_unicode
    return unicode(s, enc)
TypeError: coercing to Unicode: need string or buffer, SetCut found

It appears as if cell.cuts is sometimes returning strings, and sometimes returning SetCut objects. My guess would be that the unicode characters in the URL are returned as SetCut objects, since they are the options that cause the problem

Richsan commented 7 years ago

I was having a similar problem with u'\xe3' character. I've changed the line 58 of compat.py froms = str(s) to s = str(s.__str__().encode('utf-8','ignore')) and the same for line 52 of formatters.py and works fine for me.

alexanderfefelov commented 5 years ago

Thanks @fezeev.