WojciechMula / pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
BSD 3-Clause "New" or "Revised" License
948 stars 125 forks source link

installation under python 2: C compilation is attempted and fails #12

Closed RobertoMaurizzi closed 8 years ago

RobertoMaurizzi commented 9 years ago

Hello!

I've tried to install pyahocorasick both using pip and by cloning current master and doing a python setup.py install. Using python 2.7.9 I see setup trying to conpile the C module and failing with

pyahocorasick.c:42:1: error: unknown type name ‘PyModuleDef’
 PyModuleDef ahocorasick_module = {
 ^
pyahocorasick.c:43:2: error: ‘PyModuleDef_HEAD_INIT’ undeclared here (not in a function)

Isn't the module supposed to be compiled only for Python 3?

worenga commented 9 years ago

Same here.

Is there any workaround to get this Module running under python 2.7.9?

pombredanne commented 8 years ago

same here. I wished it would compile al right as this sounds like a real fine implementation! On Ubuntu 14.04 X86_64, Python 2.7.6.

WojciechMula commented 8 years ago

Hi guys, sorry for delay and thanks for pinging me. I've started to work on the issue. It compiles now with python2.7 on my computer (debian), but not all tests are passing. I hope to finish porting in this week.

pombredanne commented 8 years ago

@WojciechMula Thank you very much my lord! Note -but you know this- that it builds fine on Python 3

pombredanne commented 8 years ago

@WojciechMula Do you need some help? May be you can push you work in progress in a branch?

WojciechMula commented 8 years ago

@pombredanne As you suggested I've pushed new branch https://github.com/WojciechMula/pyahocorasick/tree/port-python2 All tests pass except one, pickling is broken. If you, or anybody else, can review it & post patches it would be great.

pombredanne commented 8 years ago

@WojciechMula If I run the tests here is what I get for now:

(tmp)pombreda@COMPUTER:~/w421/pyahocorasick$ python test.py 
['AHOCORASICK', 'Automaton', 'EMPTY', 'MATCH_AT_LEAST_PREFIX', 'MATCH_AT_MOST_PREFIX', 'MATCH_EXACT_LENGTH', 'STORE_ANY', 'STORE_INTS', 'STORE_LENGTH', 'TRIE', '__doc__', '__file__', '__name__', '__package__', 'unicode']
(6, (4, 'she'))
(6, (8, 'he'))
(6, (1, 'e'))
==
Traceback (most recent call last):
  File "test.py", line 35, in <module>
    a.search_all(s, callback, 2, 11)
AttributeError: 'ahocorasick.Automaton' object has no attribute 'search_all'
(tmp)pombreda@COMPUTER:~/w421/pyahocorasick$ python unittests.py 
........()
('_sh', ':')
('erhe', ':')
(3, 'she')
(3, 'he')
(4, 'her')
(6, 'he')
('rshe', ':')
(7, 'her')
(8, 'hers')
(10, 'she')
(10, 'he')
('_', ':')
......E..............................
======================================================================
ERROR: test_unpickle (__main__.TestPickle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittests.py", line 564, in test_unpickle
    B = pickle.loads(dump)
  File "/usr/lib/python2.7/pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
ValueError: invalid data to restore

----------------------------------------------------------------------
Ran 45 tests in 0.006s
pombredanne commented 8 years ago

IMHO the devils is in Unicode somewhere

pombredanne commented 8 years ago

I think I got a fix :package:

pombredanne commented 8 years ago

or at least some progress: See https://github.com/WojciechMula/pyahocorasick/blob/bef724b896aace7c339b431a2f4b66545881bdea/Automaton.c#L88 anything weird? const char* fmt = "is#iiii0" ends with a zero which is not a valid format code afaik. It should be at least const char* fmt = "is#iiiiO" with an O letter not a digit.

After applying this character change, (see https://github.com/WojciechMula/pyahocorasick/pull/16 ) the tests fail this way :

(tmp)pombreda@COMPUTER:~/w421/pyahocorasick$ python unittests.py 
........()
('_sh', ':')
('erhe', ':')
(3, 'she')
(3, 'he')
(4, 'her')
(6, 'he')
('rshe', ':')
(7, 'her')
(8, 'hers')
(10, 'she')
(10, 'he')
('_', ':')
......Automaton_pickle.c:automaton_unpickle:284 - size >= count*(sizeof(TrieNode) - sizeof(TrieNode*)) failed!
pombredanne commented 8 years ago

BTW there were some issues with not wrecking your name in comments: your files are using an ISO-8859-1 encoding... IMHO a UTF-8 would be more portable.

WojciechMula commented 8 years ago

@pombredanne This is the ancient code, from the times when UTF-8 was not widely supported. :) I'll fix it.

The module is almost ported to 2.7, the only issues is pickling. It's totally screwed for the 64-bit machines. I must redesing it.

pombredanne commented 8 years ago

@WojciechMula You rock! :+1:

WojciechMula commented 8 years ago

Sorry for delay, but finally I have found time & energy to move things forward. Please check out the current state of branch port-python2. Could you please help me with testing?

Now, at my computer, I'm able to compile the extension for both Python3 and Python2, and run unit tests. Thus, if nobody complains, I will merge the branch with master.