adswerve / universal-analytics-python

Universal Analytics Python module
BSD 3-Clause "New" or "Revised" License
107 stars 41 forks source link

Tracker doesn't fully supported Unicode data #3

Closed juliomalegria closed 10 years ago

juliomalegria commented 10 years ago

If you call send() with unicode data an UnicodeEncodeError exception is raised because the code forces data to be str (code here).

Traceback of the exception:

In [4]: tracker.send('event', u'câtēgøry', u'åctîõn', u'låbęl', u'válüē')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-4-d27179d76103> in <module>()
----> 1 tracker.send('event', u'câtēgøry', u'åctîõn', u'låbęl', u'válüē')

VIRTUAL_ENV/lib/python2.7/site-packages/UniversalAnalytics/Tracker.pyc in send(self, hittype, *args, **data)
    271 
    272 
--> 273         data = dict(self.payload(data))
    274 
    275         if self.hash_client_id:

VIRTUAL_ENV/lib/python2.7/site-packages/UniversalAnalytics/Tracker.pyc in payload(self, data)
    183     def payload(self, data):
    184         for v, k in self.payload_map(data):
--> 185             yield k[1], k[0](v)
    186 
    187 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 1: ordinal not in range(128)

And if you encode the data as UTF8, send() now raises an UnicodeDecodeError exception because the code attempts to encode data (again) as UTF8 (code here).

Traceback of the second exception:

In [5]: tracker.send('event', u'câtēgøry'.encode('utf8'), u'åctîõn'.encode('utf8'), u'låbęl'.encode('utf8'), u'válüē'.encode('utf8'))
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-105d72bea5a4> in <module>()
----> 1 tracker.send('event', u'câtēgøry'.encode('utf8'), u'åctîõn'.encode('utf8'), u'låbęl'.encode('utf8'), u'válüē'.encode('utf8'))

VIRTUAL_ENV/lib/python2.7/site-packages/UniversalAnalytics/Tracker.pyc in send(self, hittype, *args, **data)
    277 
    278         # Transmit the hit to Google...
--> 279         self.http.send(data)
    280 
    281 

VIRTUAL_ENV/lib/python2.7/site-packages/UniversalAnalytics/Tracker.pyc in send(self, data)
    128         request = Request(
    129                 self.endpoint,
--> 130                 data = urlencode(self.fixUTF8(data)),
    131                 headers = {
    132                     'User-Agent': self.user_agent

VIRTUAL_ENV/lib/python2.7/site-packages/UniversalAnalytics/Tracker.pyc in fixUTF8(cls, data)
     91         for key in data:
     92             if isinstance(data[ key ], basestring):
---> 93                 data[ key ] = data[ key ].encode('utf-8')
     94         return data
     95 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
samba commented 10 years ago

Thanks for the report @juliomalegria; I'll look into this as time permits. Do you have a proposed solution for it?

juliomalegria commented 10 years ago

@samba I'll implement one and open a pull request as soon as I have it ready.

samba commented 10 years ago

Considering this closed, as your patch seems sound. (I haven't directly tested it yet.)