clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.76k stars 1.58k forks source link

Index Error after second runtime #284

Closed vizzerdrix55 closed 5 years ago

vizzerdrix55 commented 5 years ago

Hello there I use pattern.de and installed in on Jupyter Notebook on a Mac.

I get the following error, when I try to run asptagger.tag_xml_sentence(sentence) in a for loop for the second time.

As there anything that I can do?

additional information: Python 3.7.1 (default, Dec 14 2018, 13:28:58) IPython 7.2.0

I try to run:

eos_tags = set(["post"])
tokens= ['Jetzt', 'ist', 'Deine', 'Meinung', 'gefragt', ':', 'Hier', 'kannst', 'Du', 'deinen', 'Kommentar', 'zum', 'Artikel', 'veröffentlichen', 'und', 'mit', 'anderen', 'Lesern', 'darüber', 'diskutieren', '.', 'http://www.pcgames.de/external/gfx/i...rrow_right.gif', 'Zum', 'Artikel', ':', 'http://www.pcgames.de/aid,680133']
sentences = sentence_splitter.split_xml(tokens, eos_tags)
for sentence in sentences:
    print(asptagger.tag_xml_sentence(sentence))

I get:

IndexError                                Traceback (most recent call last)
<ipython-input-29-7724e3271880> in <module>
      2 sentences = sentence_splitter.split_xml(tokens, eos_tags)
      3 for sentence in sentences:
----> 4     print(asptagger.tag_xml_sentence(sentence))

/anaconda3/lib/python3.7/site-packages/someweta/tagger.py in tag_xml_sentence(self, sentence)
    118         words = [sentence[i] for i in word_indexes]
    119         words = [html.unescape(w) for w in words]
--> 120         tagged = self.tag_sentence(words)
    121         tags = {i: t[1:] for i, t in zip(word_indexes, tagged)}
    122         tagged_xml = []

/anaconda3/lib/python3.7/site-packages/someweta/tagger.py in tag_sentence(self, sentence)
    103         self.latent_features = functools.partial(self._get_latent_features, [w.lower() for w in sentence])
    104         X = self._get_static_features(sentence, sentence_length)
--> 105         tags = list(self.predict(X, sentence_length))[0]
    106         if self.mapping is not None:
    107             return list(zip(sentence, tags, (self.mapping[lt] for lt in tags)))

/anaconda3/lib/python3.7/site-packages/someweta/averaged_structured_perceptron.py in predict(self, X, lengths)
     94         for start, length in ranges:
     95             local_X = X[start:start + length]
---> 96             predicted, features = self._beam_search(local_X, start)
     97             predicted = [self.reverse_mapping[p] for p in predicted]
     98             yield predicted

/anaconda3/lib/python3.7/site-packages/someweta/averaged_structured_perceptron.py in _beam_search(self, X, start, y)
    152                 if gold_not_in_beam:
    153                     break
--> 154         return beams[0].tags, self._extract_feature_sequence(beams[0])
    155 
    156     def _predict_static(self, features):

IndexError: list index out of range

I expect a list sentences that group every sentence together.

As I descriped, this error occurs only after the second time of running the script.