ODM2 / WOFpy

A server-side implementation of CUAHSI's Water One Flow service stack in Python.
http://odm2.github.io/WOFpy/
9 stars 9 forks source link

handle unicode characters #148

Closed miguelcleon closed 7 years ago

miguelcleon commented 7 years ago

My ODM2 database contains unicode characters that cause errors in wofpy.

This is the call that errors out http://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetSites

The error:

<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
'ascii' codec can't encode character u'\u2212' in position 33: ordinal not in range(128)
</faultstring>
<faultactor/>
</ns0:Fault>

u'\u2212' this is a unicode minus sign http://www.fileformat.info/info/unicode/char/2212/index.htm which I may just be using as a dash in a site name.

lsetiawan commented 7 years ago

I found similar problem when going through the EnviroDIY /postgresqlodm2timeseries/rest/1_1/GetValues?location=postgresqlodm2timeseries:srgd_desk&variable=postgresqlodm2timeseries:EnviroDIY_Mayfly_Temp

<ns0:Fault xmlns:ns0="http://schemas.xmlsoap.org/soap/envelope/">
<faultcode>soap11env:Server</faultcode>
<faultstring>
'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128)
</faultstring>
<faultactor/>
</ns0:Fault>

Addition from @emiliom: This error can be accessed live on the USU-hosted EnviroDIY wofpy endpoint, with this request: http://odm2wofpy.uwrl.usu.edu:8080/odm2timeseries/rest/1_1/GetValues?location=odm2timeseries:srgd_desk&variable=odm2timeseries:EnviroDIY_Mayfly_Temp

emiliom commented 7 years ago

@lsetiawan, can you describe briefly what the EnviroDIY error was that you're addressing with PR #155? What kind of values were failing, and where?

lsetiawan commented 7 years ago

I am not sure of the values that were failing. Traceback wasn't very helpful, but it came back to something with StringIO library.

Traceback (most recent call last):
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 151, in process_request
    ctx.out_object = self.call_wrapper(ctx)
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 235, in call_wrapper
    retval = ctx.descriptor.service_class.call_wrapper(ctx)
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/service.py", line 209, in call_wrapper
    return ctx.function(ctx, *args)
  File "/mnt/hgfs/Landung_2TB/Work/WOFpy/WOFpy/wof/apps/spyned_1_1.py", line 208, in GetValues
    authToken
  File "/mnt/hgfs/Landung_2TB/Work/WOFpy/WOFpy/wof/apps/spyned_1_1.py", line 185, in GetValuesObject
    return outStream.getvalue()
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128)
ocefpaf commented 7 years ago

@lsetiawan ideally we should use codecs/io and other Python 3 tools (ported to 2.7) and treat everything as bytes. This is not an easy task, but it will make the code more robust and ready for future upgrades.

Ideally, we should start by adding tests that fail, then we can implement the conversions as needed.

lsetiawan commented 7 years ago

I was too excited for it to work with EnviroDIY, but when I tested the new code with @miguelcleon database, the error is still there.

I will look into @ocefpaf suggestion.

lsetiawan commented 7 years ago

I can't seem to build the failure in https://github.com/ODM2/WOFpy/issues/148#issue-236536065 like @miguelcleon. I tried putting the exact wordings that has the suspected dash, and it's still not failing.

miguelcleon commented 7 years ago

@lsetiawan so it's failing for my db but you can't generate the same error from the EnviroDIY db? It does seem to be the case that this is a valid unicode character '\u2212' http://www.fileformat.info/info/unicode/char/2212/index.htm

emiliom commented 7 years ago

@lsetiawan, are you sure the error you reported for EnviroDIY is the same kind of error @miguelcleon reported in his database? (see "decode byte" vs "encode character")

EnviroDIY error:

<faultstring>
'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128)
</faultstring>

LCZO error:

<faultstring>
'ascii' codec can't encode character u'\u2212' in position 33: ordinal not in range(128)
</faultstring>

Also, can you describe exactly where you added the unicode dash character? Is it in the same table and field as in Miguel's database?

lsetiawan commented 7 years ago

LCZO error

The encoding error that @miguelcleon encounters is caused by https://github.com/ODM2/WOFpy/blob/master/wof/WaterML_1_1.py#L4887.

outfile.write(str(self.valueOf_).encode(ExternalEncoding))

self.valueOf_ is unicode. Trying to convert unicode object to bytes with str is not a good practice. So to fix this error. unicode objects should be converted to bytes by using .encode().

simple test

In [13]: str(u'test\u2212')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-13-a2b3bb938372> in <module>()
----> 1 str(u'test\u2212')

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2212' in position 4: ordinal not in range(128)
In [14]: u'test\u2212'.encode('utf-8')
Out[14]: 'test\xe2\x88\x92'

Reference:

lsetiawan commented 7 years ago

Now, @miguelcleon encoding error has become a decoding error, similar to EnviroDIY.

Traceback (most recent call last):
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 151, in process_request
    ctx.out_object = self.call_wrapper(ctx)
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 235, in call_wrapper
    retval = ctx.descriptor.service_class.call_wrapper(ctx)
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/site-packages/spyne/service.py", line 209, in call_wrapper
    return ctx.function(ctx, *args)
  File "/mnt/hgfs/Landung_2TB/Work/WOFpy/WOFpy/wof/apps/spyned_1_1.py", line 62, in GetSites
    siteResult = WOFService.GetSitesObject(ctx, site, authToken)
  File "/mnt/hgfs/Landung_2TB/Work/WOFpy/WOFpy/wof/apps/spyned_1_1.py", line 40, in GetSitesObject
    return outStream.getvalue()
  File "/home/lsetiawan/miniconda/envs/wofpy/lib/python2.7/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 3: ordinal not in range(128)
lsetiawan commented 7 years ago

To handle both cases. I am using what @ocefpaf suggested: io.BytesIO() in https://github.com/ODM2/WOFpy/blob/master/wof/apps/spyned_1_1.py#L38

outStream = io.BytesIO()

And this seems to solve the problem for both errors, since the strings are treated as bytes.

I will create a PR with those changes.

miguelcleon commented 7 years ago

@lsetiawan sounds great! I'll give it a try once it's up.

emiliom commented 7 years ago

Great progress!! And @lsetiawan, thanks for the additional information you provided above.

emiliom commented 7 years ago

This issue was fully fixed, right? Looks like the only outstanding thing was for @miguelcleon to confirm that things were now find on his end?

lsetiawan commented 7 years ago

@emiliom As far as the specific issue of unicode handling, @miguelcleon said that it works here, But then he encountered SSL Error, which is another issue.

emiliom commented 7 years ago

Thanks, @lsetiawan. I'm closing this issue, then.

sreeder commented 7 years ago

@emiliom, @lsetiawan I have been getting this issue today. I reinstalled the newest version of wofpy and it fixed one of unicode errors but it failed on a second. 'ascii' codec can't encode character u'\u2122' in position 18: ordinal not in range(128) It is occurring on a getvalues call. The character is the trademark symbol located in the organization definition field. It is connected to a postgres database.

emiliom commented 7 years ago

Thanks for letting us know, @sreeder. Do you have a public WOFpy endpoint with a specific location and variable that we can use for testing?

@lsetiawan, please look into this tomorrow. Hopefully it'll be an easy fix, given the previous work you've done on unicode handling.

sreeder commented 7 years ago

@emiliom Yes, I do. It is on our envirodiy website and the exact link is here: envirodiy

emiliom commented 7 years ago

@emiliom Yes, I do. It is on our envirodiy website and the exact link is here: envirodiy

Perfect, thanks! That'll be very useful.

ocefpaf commented 7 years ago

Sorry for the noise but I could not resist:

http://qainsight.net/wp-content/uploads/dasbin/WindowsLiveWriter/IHeartUnicode_11D56/image_4.png

emiliom commented 7 years ago

PR #187 was replaced by #188; thanks @ocefpaf and @lsetiawan! Progress is being made there ...

miguelcleon commented 7 years ago

I am noticing a unicode error in my apache error log:


[Wed Oct 11 16:42:01.410084 2017] [wsgi:error] [pid 27902:tid 140228831500032] ERROR:spyne.application:Fault(Server: "'NoneType' object has no attribute 'export'")
[Wed Oct 11 16:42:01.410111 2017] [wsgi:error] [pid 27902:tid 140228831500032] Traceback (most recent call last):
[Wed Oct 11 16:42:01.410114 2017] [wsgi:error] [pid 27902:tid 140228831500032]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 151, in process_request
[Wed Oct 11 16:42:01.410116 2017] [wsgi:error] [pid 27902:tid 140228831500032]     ctx.out_object = self.call_wrapper(ctx)
[Wed Oct 11 16:42:01.410119 2017] [wsgi:error] [pid 27902:tid 140228831500032]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 235, in call_wrapper
[Wed Oct 11 16:42:01.410121 2017] [wsgi:error] [pid 27902:tid 140228831500032]     retval = ctx.descriptor.service_class.call_wrapper(ctx)
[Wed Oct 11 16:42:01.410123 2017] [wsgi:error] [pid 27902:tid 140228831500032]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/service.py", line 209, in call_wrapper
[Wed Oct 11 16:42:01.410126 2017] [wsgi:error] [pid 27902:tid 140228831500032]     return ctx.function(ctx, *args)
[Wed Oct 11 16:42:01.410128 2017] [wsgi:error] [pid 27902:tid 140228831500032]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/wof/apps/spyned_1_1.py", line 100, in GetSiteInfo
[Wed Oct 11 16:42:01.410130 2017] [wsgi:error] [pid 27902:tid 140228831500032]     siteinfoResult = WOFService.GetSiteInfoObject(ctx, site, authToken)
[Wed Oct 11 16:42:01.410141 2017] [wsgi:error] [pid 27902:tid 140228831500032]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/wof/apps/spyned_1_1.py", line 96, in GetSiteInfoObject
[Wed Oct 11 16:42:01.410143 2017] [wsgi:error] [pid 27902:tid 140228831500032]     raise Fault(faultstring=str(inst))
[Wed Oct 11 16:42:01.410145 2017] [wsgi:error] [pid 27902:tid 140228831500032] Fault: Fault(Server: "'NoneType' object has no attribute 'export'")
[Wed Oct 11 16:42:36.455256 2017] [wsgi:error] [pid 27902:tid 140228789536512] ERROR:spyne.application:Fault(Server: "'ascii' codec can't encode character u'\\\\u2013' in position 263: ordinal not in range(128)")
[Wed Oct 11 16:42:36.455295 2017] [wsgi:error] [pid 27902:tid 140228789536512] Traceback (most recent call last):
[Wed Oct 11 16:42:36.455298 2017] [wsgi:error] [pid 27902:tid 140228789536512]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 151, in process_request
[Wed Oct 11 16:42:36.455301 2017] [wsgi:error] [pid 27902:tid 140228789536512]     ctx.out_object = self.call_wrapper(ctx)
[Wed Oct 11 16:42:36.455303 2017] [wsgi:error] [pid 27902:tid 140228789536512]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/application.py", line 235, in call_wrapper
[Wed Oct 11 16:42:36.455306 2017] [wsgi:error] [pid 27902:tid 140228789536512]     retval = ctx.descriptor.service_class.call_wrapper(ctx)
[Wed Oct 11 16:42:36.455308 2017] [wsgi:error] [pid 27902:tid 140228789536512]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/spyne/service.py", line 209, in call_wrapper
[Wed Oct 11 16:42:36.455310 2017] [wsgi:error] [pid 27902:tid 140228789536512]     return ctx.function(ctx, *args)
[Wed Oct 11 16:42:36.455312 2017] [wsgi:error] [pid 27902:tid 140228789536512]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/wof/apps/spyned_1_1.py", line 172, in GetVariables
[Wed Oct 11 16:42:36.455315 2017] [wsgi:error] [pid 27902:tid 140228789536512]     varsResult = WOFService.GetVariablesObject(ctx, authToken)
[Wed Oct 11 16:42:36.455323 2017] [wsgi:error] [pid 27902:tid 140228789536512]   File "/home/azureadmin/miniconda2/envs/wofpy/lib/python2.7/site-packages/wof/apps/spyned_1_1.py", line 168, in GetVariablesObject
[Wed Oct 11 16:42:36.455325 2017] [wsgi:error] [pid 27902:tid 140228789536512]     raise Fault(faultstring=str(inst))
[Wed Oct 11 16:42:36.455328 2017] [wsgi:error] [pid 27902:tid 140228789536512] Fault: Fault(Server: "'ascii' codec can't encode character u'\\\\u2013' in position 263: ordinal not in range(128)")
lsetiawan commented 7 years ago

Thanks @miguelcleon we are working on this. Is that unicode stuff within your backup database?

miguelcleon commented 7 years ago

@lsetiawan yes it is.

miguelcleon commented 7 years ago

I'm still getting errors but I don't think they are related to unicode now :smile:

this works: https://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetSites

this works: https://dev-odm2admin.cuahsi.org/wofpy/odm2timeseries/rest/1_1/GetSiteInfo?site=odm2timeseries:Rio%20Icacos%20Trib-IO

but soap is not working, will follow up in other issue.

lsetiawan commented 7 years ago

@miguelcleon Great! One down two more problems to go! :man_dancing:

emiliom commented 7 years ago

@miguelcleon and @lsetiawan Can we close this issue?

miguelcleon commented 7 years ago

Yes