Error storing utf-8 unicode values

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Access the admin interface
2. Edit or create a new record with the content of a CharField containing
latin chars like 'ã', 'ç'.

What is the expected output? What do you see instead?
I expected the change to be accepted. 

Instead we see:
Traceback (most recent call last):

  File
"/home/extranet/python/lib/python2.6/site-packages/django/core/servers/basehttp.
py",
line 278, in run
    self.result = application(self.environ, self.start_response)

  File
"/home/extranet/python/lib/python2.6/site-packages/django/core/servers/basehttp.
py",
line 635, in __call__
    return self.application(environ, start_response)

  File
"/home/extranet/python/lib/python2.6/site-packages/django/core/handlers/wsgi.py"
,
line 239, in __call__
    response = self.get_response(request)

  File
"/home/extranet/python/lib/python2.6/site-packages/django/core/handlers/base.py"
,
line 127, in get_response
    receivers = signals.got_request_exception.send(sender=self.__class__,
request=request)

  File
"/home/extranet/python/lib/python2.6/site-packages/django/dispatch/dispatcher.py
",
line 148, in send
    response = receiver(signal=self, sender=sender, **named)

  File
"/home/extranet/python/lib/python2.6/site-packages/django/db/__init__.py",
line 60, in _rollback_on_exception
    transaction.rollback_unless_managed()

  File
"/home/extranet/python/lib/python2.6/site-packages/django/db/transaction.py",
line 157, in rollback_unless_managed
    connection._rollback()

  File
"/home/extranet/python/lib/python2.6/site-packages/django/db/backends/__init__.p
y",
line 38, in _rollback
    return self.connection.rollback()

Error: ('HY000', 'The driver did not supply an error!')

What version of the product are you using? On what operating system?
I'm using:
- Ubuntu Linux 8.04
- UnixODBC: 2.2.11-16build1
- TDSODBC: 0.63-3.2ubuntu1
- Django: trunk
- django-pyodbc: trunk
- Python: Python 2.5.2

Please provide any additional information below.

Tryed both Python 2.5.2 (Ubuntu native, UCS4) and Python 2.6.1 (compiled
for UCS2). Same issues.

# cat /etc/odbcinst.ini
[FreeTDS]
Description     = TDS driver (Sybase/MS SQL)
Driver          = /usr/lib/odbc/libtdsodbc.so
Setup           = /usr/lib/odbc/libtdsS.so
CPTimeout       =
CPReuse         =
client_charset  = UTF8
tds_version     = 8.0

# cat /etc/freetds/freetds.conf
[global]
        tds version = 4.2
        text size = 64512

[SQLSERVER]
        host = x.x.x.x
        port = 1433
        client charset = UTF8
        tds version = 8.0

Original issue reported on code.google.com by paulo.sc...@gmail.com on 28 Jan 2009 at 11:39

GoogleCodeExporter commented 9 years ago

I've just set up an environment using what I have at hand:

Ubuntu 8.10
    freetds's tdsodbc 0.82-3ubuntu1 (as shipped with the distribution)
    unixodbc 2.2.11-16build2 (as shipped with the distribution)
    pyodbc 2.4.1
Your unixodbc and FreeTDs configuration files
SQL Server 2000

(btw it seems you are mixing some settings from odbc.ini in odbcinst.ini)

and tested saving an example model containing 'áéíóúñÁÉÍÓÚÑÇç' 
that worked without
problems. This confirms what I suspected:

0.63 is a too old version of FreeTDS to be useful (it was released almost four 
years
ago) and I wouldn't be surprised if its support of UTF-8 as a client-side 
encoding
was rather immature then.

For instance, compare the wording used in 0.63 users guide (nonwestern.htm file)

"
Important FreeTDS is not fully compatible with multi-byte character sets such
          as UCS-2. You must use an ASCII-extension charset (e.g., UTF-8,
          ISO-8859-*)[1]. Extreme care should be taken with testing
          applications using these encodings. Specifically, many applications
          do not expect the number of characters returned to exceed the column
          size (in bytes). On the other hand, support of UTF-8 and UCS-2 is a
          high priority for the developers. Patches and bug reports in this
          area are especially welcome.
"

With the same paragraph in the equivalent 0.82 document (localization.htm):

"
Important FreeTDS is not fully compatible with multi-byte character sets such
          as UCS-2. You must use an ASCII-extension charset (e.g., UTF-8,
          ISO-8859-*)[2]. Great care should be taken testing applications using
          these encodings. Specifically, many applications do not expect the
          number of characters returned to exceed the column size (in bytes).
"

So you might want to try compiling and testing a newer FreeTDS version.

FYI, Additionally, I've found that even 0.82 isn't bug-free enough and as 
FreeTDS 
trunk is currently rather unstable I use the "official patched 0.82 version" 
from 
http://freetds.sourceforge.net/. I'm using the Django test suite as a way to 
measure
this and the difference in numbers of failures between 0.82 and the "official 
patched 0.82 version" is abysmal (in the first case there is a lot of them 
caused
by FreeTDS errors and in the latter case the remaining test suite failures are 
all 
attributable to django-pyodbc itself).

This has even allowed me to do all my django-pyodbc development on Linux 
and only periodically validate things on win32 (using MS ODBC drivers)

I will leave the ticket open, please report back your experience and 
conclusions if
possible.

Original comment by cra...@gmail.com on 2 Feb 2009 at 8:48

GoogleCodeExporter commented 9 years ago

The pyodbc documentation says that unicode handling for the MSSQL TDS ODBC 
driver is
problematic because python stores unicode strings in UCS-4 and the driver 
returns
them in UCS-2 - the pyodbc layer does not translate UCS-2<=>UCS-4. 

Pyodbc docs suggests compiling python with UCS-2 flags; when I tried that, I got
other error messages.

WORKAROUND FOR READ-ONLY MODELS: My application uses data from MSSQL in a 
read-only
fashion, so I managed to create some views using the PostgreSQL module 
dblink_tds.
This module works as expected with unicode, but joins using dblink_tds views are
somewhat inefficient (each view retrieves all rows from MSSQL before joining). 
For
reazonable size databases with shallow nested data models this may be 
acceptable.

Original comment by paulo.sc...@gmail.com on 4 Feb 2009 at 8:37

GoogleCodeExporter commented 9 years ago

In which version of pyodbc documentatation did you read that? (BTW, What 
version of
pyodbc are you using?)

Because I don't find anything like that in he current (2.1.4) pyodbc source 
code nor
documentation.

And I think that's actually wrong. The TDS protocol uses UCS-2 over the wire but
FreeTDS converts it to/from the client charset you specify in freetds.conf by 
using
the iconv library and so client applitactions don`t deal at all with UCS-2 data.

A long term aim of FreeTDS is to be able to offer an Unicode inteface to the 
client
apps, and that (hopefully with similar advances in the pyodbc and django-pyodbc
fronts) could be a good match for the Django Unicode support when talking with 
DB
backends (i.e. no encoding/decoding would be needed in django-pyodb).

Meanwhile, we need that FreeTDS talk to us using UTF-8 (so the "client charset =
UFT-8" freetds.conf setting is needed and we hardcode the UTF-8 encoding of 
Unicode
data handed to us by Django and UTF-8 decoding of data we get from the DB)

I will close this ticket a week from now.

Original comment by cra...@gmail.com on 5 Feb 2009 at 8:36

GoogleCodeExporter commented 9 years ago

Original comment by cra...@gmail.com on 14 Feb 2009 at 2:23

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

I have hit the same problem, but with only some characters.

For example, the following causes a system crash: Mečová (specifically č), 
but when I
try áéíóúñÁÉÍÓÚÑÇç it works just fine.

I have 0.82 of FreeTDS, and I can replicate on both a Mac (installed through
MacPorts) and an Ubuntu (standard 8.10 package) system.

When I try it on Windows, it works perfectly.

Any suggestions are very welcome, as I'm really trying to avoid using Windows 
for web
servers in a live deployment.

Original comment by matt.j.s...@gmail.com on 12 Mar 2009 at 6:06

GoogleCodeExporter commented 9 years ago

Is there a workaround for this?

Original comment by djmar...@gmail.com on 13 Dec 2010 at 7:25

GoogleCodeExporter commented 9 years ago

We have tested with more recent builds of FreeTDS and unixODBC (latest with the 
10.4 Ubuntu packages) and the problem seems to have disappeared. Our problems 
must have been with the underlying drivers.

Original comment by matt.j.s...@gmail.com on 13 Dec 2010 at 7:37

GoogleCodeExporter commented 9 years ago

Does 'Mečová' work for you now? On my ubuntu 10.04 with MSSQL 2005 it still 
doesnt.

Original comment by djmar...@gmail.com on 13 Dec 2010 at 7:46

GoogleCodeExporter commented 9 years ago

Would it be possible to drop the characters that are causing problems before 
they hit the driver as a workaround?

Original comment by djmar...@gmail.com on 14 Dec 2010 at 8:46

google-code-export / django-pyodbc

Error storing utf-8 unicode values #41