Closed pombredanne closed 2 years ago
@chaitanya710 ... there are a few tests failing in this PR. Could you check what is the problem and why these are failing? Could the test be wrong? Or why do we have such a regression? If you have a fix, can you push a branch derived from this branch with a possible fix? Thanks!
@chaitanya710 gentle ping
We have these failures on Windows only, unicode builds all pythons:
=================================== FAILURES ===================================
______________ test_basic_items_keys_and_values_with_bytes_build _______________
@skipIf(ahocorasick.unicode, "Run only with bytes build")
def test_basic_items_keys_and_values_with_bytes_build():
automaton = ahocorasick.Automaton()
words = b"he e hers his she hi him man he".split()
# 0 1 2 3 4 5 6 7 8
for i, w in enumerate(words):
automaton.add_word(w, (i, w))
expected_keys = [b'e', b'hers', b'his', b'she', b'hi', b'him', b'man', b'he', ]
expected_values = [
(1, b'e'),
(2, b'hers'),
(3, b'his'),
(4, b'she'),
(5, b'hi'),
(6, b'him'),
(7, b'man'),
(8, b'he'),
]
assert sorted(automaton.keys()) == sorted(expected_keys)
assert sorted(automaton.values()) == sorted(expected_values)
> assert sorted(dict(automaton.items()).values()) == sorted(expected_values)
E AssertionError: assert [(1, b'e'), (2, b'hers'), (3, b'his'), (4, b'she'), (7, b'man'), (8, b'he')] == [(1, b'e'),\n (2, b'hers'),\n (3, b'his'),\n (4, b'she'),\n (5, b'hi'),\n (6, b'him'),\n (7, b'man'),\n (8, b'he')]
E At index 4 diff: (7, b'man') != (5, b'hi')
E Right contains 2 more items, first extra item: (7, b'man')
E Full diff:
E [
E (1,
E b'e'),
E (2,
E b'hers'),
E (3,
E b'his'),
E (4,
E b'she'),
E - (5,
E - b'hi'),
E - (6,
E - b'him'),
E (7,
E b'man'),
E (8,
E b'he'),
E ]
tests/test_basic.py:71: AssertionError
_________________________ TestTrieIterators.test_items _________________________
self = <test_unit.TestTrieIterators testMethod=test_items>
def test_items(self):
A = self.A
I = []
for i, w in enumerate(self.words):
A.add_word(conv(w), i + 1)
I.append((conv(w), i + 1))
L = [x for x in A.items()]
self.assertEqual(len(L), len(I))
> self.assertEqual(set(L), set(I))
E AssertionError: Items in the first set but not the second:
E (b'w\x00o\x00', 1)
E (b'a\x00h', 3)
E (b'p\x00y\x00t\x00', 2)
E (b'c\x00o\x00r\x00a\x00', 4)
E Items in the second set but not the first:
E (b'word', 1)
E (b'python', 2)
E (b'corasick', 4)
E (b'aho', 3)
tests/test_unit.py:431: AssertionError
=========================== short test summary info ============================
FAILED tests/test_basic.py::test_basic_items_keys_and_values_with_bytes_build
FAILED tests/test_unit.py::TestTrieIterators::test_items - AssertionError: It...
=================== 2 failed, 156 passed, 9 skipped in 1.51s ===================
And these failures with a bytes build on all OSes and all Pythons:
=================================== FAILURES ===================================
______________ test_basic_items_keys_and_values_with_bytes_build _______________
@skipIf(ahocorasick.unicode, "Run only with bytes build")
def test_basic_items_keys_and_values_with_bytes_build():
automaton = ahocorasick.Automaton()
words = b"he e hers his she hi him man he".split()
# 0 1 2 3 4 5 6 7 8
for i, w in enumerate(words):
automaton.add_word(w, (i, w))
expected_keys = [b'e', b'hers', b'his', b'she', b'hi', b'him', b'man', b'he', ]
expected_values = [
(1, b'e'),
(2, b'hers'),
(3, b'his'),
(4, b'she'),
(5, b'hi'),
(6, b'him'),
(7, b'man'),
(8, b'he'),
]
assert sorted(automaton.keys()) == sorted(expected_keys)
assert sorted(automaton.values()) == sorted(expected_values)
> assert sorted(dict(automaton.items()).values()) == sorted(expected_values)
E AssertionError: assert [(1, b'e'), (2, b'hers'), (3, b'his'), (4, b'she'), (7, b'man'), (8, b'he')] == [(1, b'e'),\n (2, b'hers'),\n (3, b'his'),\n (4, b'she'),\n (5, b'hi'),\n (6, b'him'),\n (7, b'man'),\n (8, b'he')]
E At index 4 diff: (7, b'man') != (5, b'hi')
E Right contains 2 more items, first extra item: (7, b'man')
E Full diff:
E [
E (1,
E b'e'),
E (2,
E b'hers'),
E (3,
E b'his'),
E (4,
E b'she'),
E - (5,
E - b'hi'),
E - (6,
E - b'him'),
E (7,
E b'man'),
E (8,
E b'he'),
E ]
tests/test_basic.py:71: AssertionError
_________________________ TestTrieIterators.test_items _________________________
self = <test_unit.TestTrieIterators testMethod=test_items>
def test_items(self):
A = self.A
I = []
for i, w in enumerate(self.words):
A.add_word(conv(w), i + 1)
I.append((conv(w), i + 1))
L = [x for x in A.items()]
self.assertEqual(len(L), len(I))
> self.assertEqual(set(L), set(I))
E AssertionError: Items in the first set but not the second:
E (b'w\x00o\x00', 1)
E (b'p\x00y\x00t\x00', 2)
E (b'c\x00o\x00r\x00a\x00', 4)
E (b'a\x00h', 3)
E Items in the second set but not the first:
E (b'python', 2)
E (b'corasick', 4)
E (b'word', 1)
E (b'aho', 3)
tests/test_unit.py:431: AssertionError
=========================== short test summary info ============================
FAILED tests/test_basic.py::test_basic_items_keys_and_values_with_bytes_build
FAILED tests/test_unit.py::TestTrieIterators::test_items - AssertionError: It...
Weirdly enough these are the same failure than on Windows unicode builds...
Here is the tally:
_____________________ TestMemory.test_does_not_leak_memory _____________________
self =
@unittest.skipIf(not on_linux, "Works only on linux")
def test_does_not_leak_memory(self):
before = get_memory_usage()
use_memory()
after = get_memory_usage()
assert before == after
E assert 116672.0 == 116928.0 E +116672.0 E -116928.0
tests/unresolved_bugs/test_bug_81.py:66: AssertionError
__ test_basic_items_keys_and_values_with_bytes_build ___
@skipIf(ahocorasick.unicode, "Run only with bytes build")
def test_basic_items_keys_and_values_with_bytes_build():
automaton = ahocorasick.Automaton()
words = b"he e hers his she hi him man he".split()
# 1 2 3 4 5 6 7 8 9
for i, w in enumerate(words, 1):
automaton.add_word(w, (i, w))
expected_keys = [b'e', b'hers', b'his', b'she', b'hi', b'him', b'man', b'he', ]
expected_values = [
# the second addition munges this by design, like in a dict (1, b'he'),
(2, b'e'),
(3, b'hers'),
(4, b'his'),
(5, b'she'),
(6, b'hi'),
(7, b'him'),
(8, b'man'),
(9, b'he'),
]
expected_items = [
# the second addition munges this by design, like in a dict (b'he', (1, b'he')),
(b'e', (2, b'e')),
(b'hers', (3, b'hers')),
(b'his', (4, b'his')),
(b'she', (5, b'she')),
(b'hi', (6, b'hi')),
(b'him', (7, b'him')),
(b'man', (8, b'man')),
(b'he', (9, b'he')),
]
assert sorted(automaton.keys()) == sorted(expected_keys)
assert sorted(automaton.values()) == sorted(expected_values)
E - (b'corasick', 4), E - (b'python', 2), E - (b'word', 1), E + (b'c\x00o\x00r\x00a\x00', 4), E + (b'p\x00y\x00t\x00', 2), E + (b'w\x00o\x00', 1), E ]
tests/test_unit.py:433: AssertionError =========================== short test summary info ============================ FAILED tests/test_basic.py::test_basic_items_keys_and_values_with_bytes_build FAILED tests/test_unit.py::TestTrieIterators::test_items - AssertionError: as...
@chaitanya710 ... there are a few tests failing in this PR. Could you check what is the problem and why these are failing? Could the test be wrong? Or why do we have such a regression? If you have a fix, can you push a branch derived from this branch with a possible fix? Thanks!
@pombredanne sure I will check it and see what I can do
@chaitanya710 thanks @WojciechMula unless you have an objection I will merge and tag a beta release. The few pending issues existed before anyway and can be fixed afterwards
BTW, I do not think we have a memory leak. Rather the test was incorrect and is now corrected. We have an issue wrt. the Automaton.items() function on Windows and also for all bytes builds. See https://github.com/WojciechMula/pyahocorasick/blob/3e9478411ff361d4c3f274f01e8709bbd871caff/tests/unresolved_bugs/test_bug_81.py
I am merging and pushing a beta release now.
This PR is to prepare a 2.0.0.b1 release. There is no major code changes, but a lot of code reorganization:
Note that the PR could be split in many smaller ones if this is too much to swallow at once. Also: there are failing tests, but these were failing before... but we did not knew because we never ran these, so I would not consider this as being in the way of a release.
Signed-off-by: Philippe Ombredanne pombredanne@nexb.com