WojciechMula / aspell-python

Python wrapper for aspell (C extension and python version)
BSD 3-Clause "New" or "Revised" License
81 stars 20 forks source link

Speller.suggest() segfaults under Python 3 #1

Closed Troon closed 10 years ago

Troon commented 10 years ago

System is an Ubuntu 14.04 Linode, with the latest zip file built as per instructions (python3 setup.py build && python3 setup.py install for Python 3). All dependencies are satisfied, I believe (aspell, aspell-en, libaspell-dev, libaspell15).

Python 2:

>>> import aspell
>>> s = aspell.Speller()
>>> s.check('test')
1
>>> s.suggest('test')
['test', 'testy', 'teat', 'tests', 'rest', 'yest', 'Tet', 'EST', 'est', 'Tess', 'text', 'Best', 'TESL', 'West', 'Zest', 'best', 'fest', 'jest', 'lest', 'nest', 'pest', 'tent', 'vest', 'west', 'zest', "Te's", "test's", "Tet's"]

Python 3:

>>> import aspell
>>> s = aspell.Speller()
>>> s.check('test')
True
>>> s.suggest('test')
Segmentation fault

Any suggestions? Note the different result type for Speller.check()...

WojciechMula commented 10 years ago

Please try to run pyaspell.py directly, i.e.: python3 pyaspell.py.

Is you machine running 32-bit or 64-bit system?

Troon commented 10 years ago

Interesting: that works fine. The check() and suggest() tests work as expected.

root@vps:/usr/src/aspell-python-master# python3 pyaspell.py
True
True
["b'when'", "b'wen'", "b'wean'", "b'ween'"]
["b'ween'", "b'when'", "b'wen'", "b'wean'"]
(etc)

It also works when I import pyaspell.py:

root@vps:/usr/src/aspell-python-master# python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyaspell
>>> s = pyaspell.Aspell()
>>> s.check('test')
True
>>> s.suggest('test')
["b'test'", "b'testy'", "b'teat'", "b'tests'", "b'rest'", "b'yest'", "b'Tet'", "b'EST'", "b'est'", "b'Tess'", "b'text'", "b'Best'", "b'TESL'", "b'West'", "b'Zest'", "b'best'", "b'fest'", "b'jest'", "b'lest'", "b'nest'", "b'pest'", "b'tent'", "b'vest'", "b'west'", "b'zest'", 'b"Te\'s"', 'b"test\'s"', 'b"Tet\'s"']

It's a 64-bit VPS running Xen with two Xeon E5-2680 cores. OS is Ubuntu Server 14.04. There's not much else on the box: it's been cleaned out to see if moving my project to Python 3 is feasible. The build process generates a couple of warnings that look like they might be key to this:

root@vps:/usr/src/aspell-python-master# python3 setup.py build
running build
running build_ext
building 'aspell' extension
creating build
creating build/temp.linux-x86_64-3.4
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c aspell.c -o build/temp.linux-x86_64-3.4/aspell.o
aspell.c: In function ‘m_suggest’:
aspell.c:465:2: warning: passing argument 5 of ‘get_arg_string’ from incompatible pointer type [enabled by default]
  buf = get_arg_string(self, args, 0, &word, &length);
  ^
aspell.c:393:18: note: expected ‘Py_ssize_t *’ but argument is of type ‘int *’
 static PyObject* get_arg_string(
                  ^
aspell.c: In function ‘m_addtoSession’:
aspell.c:526:2: warning: passing argument 5 of ‘get_arg_string’ from incompatible pointer type [enabled by default]
  buf = get_arg_string(self, args, 0, &word, &length);
  ^
aspell.c:393:18: note: expected ‘Py_ssize_t *’ but argument is of type ‘int *’
 static PyObject* get_arg_string(
                  ^
aspell.c: In function ‘m_addReplacement’:
aspell.c:556:2: warning: passing argument 5 of ‘get_arg_string’ from incompatible pointer type [enabled by default]
  Mbuf = get_arg_string(self, args, 0, &mis, &ml);
  ^
aspell.c:393:18: note: expected ‘Py_ssize_t *’ but argument is of type ‘int *’
 static PyObject* get_arg_string(
                  ^
aspell.c:562:2: warning: passing argument 5 of ‘get_arg_string’ from incompatible pointer type [enabled by default]
  Cbuf = get_arg_string(self, args, 1, &cor, &cl);
  ^
aspell.c:393:18: note: expected ‘Py_ssize_t *’ but argument is of type ‘int *’
 static PyObject* get_arg_string(
                  ^
aspell.c: In function ‘PyInit_aspell’:
aspell.c:734:12: warning: variable ‘dict’ set but not used [-Wunused-but-set-variable]
  PyObject *dict;
            ^
creating build/lib.linux-x86_64-3.4
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.4/aspell.o -L/usr/local/lib/ -laspell -o build/lib.linux-x86_64-3.4/aspell.cpython-34m.so

Hope that helps.

WojciechMula commented 10 years ago

Thanks for the details, it seems there are some issues on 64-bit architecture. Unfortunately I have no access to 64-bit machine, and thus can't track down the causes of problem.

So the only solution I could suggest is to use the python module. It's api is the same as a C extension.

Troon commented 10 years ago

OK, thanks. It's not a problem for me right now, as there are several other blockers to moving to Python 3 — but it's good to know there's a solution for this issue. See your personal email.

Troon commented 10 years ago

Fixed it myself. The fix is too simple for a pull request, just change four ints to Py_ssize_t. It then compiles and works fine here.

--- /usr/src/aspell-python-master/aspell.c      2013-09-14 12:18:23.000000000 +0100
+++ aspell.c    2014-05-20 10:32:57.673569112 +0100
@@ -458,7 +458,7 @@
 /* method:suggest ************************************************************/
 static PyObject* m_suggest(PyObject* self, PyObject* args) {
        char* word;
-       int   length;
+       Py_ssize_t   length;
        PyObject* buf;
        PyObject* list;

@@ -520,7 +520,7 @@
 /* method:addtoSession ********************************************************/
 static PyObject* m_addtoSession(PyObject* self, PyObject* args) {
        char *word;
-       int   length;
+       Py_ssize_t   length;
        PyObject* buf;

        buf = get_arg_string(self, args, 0, &word, &length);
@@ -548,8 +548,8 @@

 /* method:addReplacement ******************************************************/
 static PyObject* m_addReplacement(PyObject* self, PyObject* args) {
-       char *mis; int ml;
-       char *cor; int cl;
+       char *mis; Py_ssize_t ml;
+       char *cor; Py_ssize_t cl;
        PyObject* Mbuf;
        PyObject* Cbuf;
WojciechMula commented 10 years ago

Thank you very much for your help and for this fix. I'll update sources in coming days.

WojciechMula commented 10 years ago

I've applied your patch. Thanks again.