clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.72k stars 1.58k forks source link

Problema StopIteration #308

Open Ivanknop opened 4 years ago

Ivanknop commented 4 years ago

Al ejecutar el módulo tira error de StopIteration. En otros foros me informaron que es porque tengo Python 3.8 y pattern 3.6 Me sugiereron este atajo: def solo_los_verbos(frase): try: s = parse(frase).split() for cada in s: for c in cada: if c[1] == 'VB': print("{}: es un verbo".format(c[0])) else: print("{}: NO es un verbo".format(c[0])) except: pass

Pero pasa directo al except

juanpenia commented 4 years ago

Hola, esto esta arreglado pero no en un release oficial, sino un fix de la comunidad. Es por una modificacion que se da a partir de 3.7+ que tiene que ver con algo de los generadores y que se yo que.

https://github.com/NicolasBizzozzero/pattern/pull/1/commits/286792826e4c0aa9b69f3894fc79b8c47b8d43c3

La solución esta ahi, pero yo tambien hice un fork y mergie una branch con arreglos de la comunidad.

aghasemi commented 3 years ago

Hi. Any idea how we can apply that fix on the library that hs been installed via pip?

ChenYang-Huang commented 3 years ago

The fix is proposed in many pull requests but it looks like the maintainer hasn't been active.

To fix it, go to $installation_path_ofpattern/text/__init_\.py Change line 608-609 yield line raise StopIteration

to image This has made conjugate() working for me in Python 3.7.4

nershman commented 3 years ago

Note that in my experience another, lazier fix, is that in your code you can run parse() 2-3 times and then the StopIteration error will stop occuring.

>>> from pattern.en import parse
>>> parse('This is a test.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 609, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pattern/text/en/__init__.py", line 169, in parse
    return parser.parse(s, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 1172, in parse
    s[i] = self.find_tags(s[i], **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/en/__init__.py", line 114, in find_tags
    return _Parser.find_tags(self, tokens, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 1113, in find_tags
    lexicon = kwargs.get("lexicon", self.lexicon or {}),
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 376, in __len__
    return self._lazy("__len__")
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 368, in _lazy
    self.load()
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 625, in load
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 625, in <genexpr>
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
RuntimeError: generator raised StopIteration
>>> parse('This is a test.')
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 609, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pattern/text/en/__init__.py", line 169, in parse
    return parser.parse(s, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 1172, in parse
    s[i] = self.find_tags(s[i], **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/en/__init__.py", line 114, in find_tags
    return _Parser.find_tags(self, tokens, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 1112, in find_tags
    return find_tags(tokens,
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 1540, in find_tags
    tagged = entities.apply(tagged)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 976, in apply
    if w in self:
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 382, in __contains__
    return self._lazy("__contains__", *args)
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 368, in _lazy
    self.load()
  File "/usr/local/lib/python3.8/site-packages/pattern/text/__init__.py", line 959, in load
    for x in _read(self.path):
RuntimeError: generator raised StopIteration
>>> parse('This is a test.')
'This/DT/O/O is/VBZ/B-VP/O a/DT/B-NP/O test/NN/I-NP/O ././O/O'
eterna2 commented 1 year ago

instead of changing the source code, u can do monkey patching. i.e.


import os.path

import pattern.text

from pattern.helpers import decode_string
from codecs import BOM_UTF8

BOM_UTF8 = BOM_UTF8.decode("utf-8")
decode_utf8 = decode_string

def _read(path, encoding="utf-8", comment=";;;"):
    """Returns an iterator over the lines in the file at the given path,
    strippping comments and decoding each line to Unicode.
    """
    if path:
        if isinstance(path, str) and os.path.exists(path):
            # From file path.
            f = open(path, "r", encoding="utf-8")
        elif isinstance(path, str):
            # From string.
            f = path.splitlines()
        else:
            # From file or buffer.
            f = path
        for i, line in enumerate(f):
            line = line.strip(BOM_UTF8) if i == 0 and isinstance(line, str) else line
            line = line.strip()
            line = decode_utf8(line, encoding)
            if not line or (comment and line.startswith(comment)):
                continue
            yield line

pattern.text._read = _read