Hironsan / anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
https://anago.herokuapp.com/
MIT License
1.48k stars 371 forks source link

Training never starts/finishes #130

Closed CharlesAverill closed 4 years ago

CharlesAverill commented 4 years ago

If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

System information

Describe the problem

Basic anago installation is broken, first of all. Using the code provided in the README, traininng never starts. After replacing layers.py with the version mentioned in issue #115 the training seems to begin, but never displays output, and never finishes. When not using Google Colab, it is impossible to stop the python script without closing the terminal tab that is running it.

Source code / logs

The code causing issues:

import anago

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split

class SentenceGetter(object):

    def __init__(self, data):
        self.n_sent = 1
        self.data = data
        self.empty = False
        agg_func = lambda s: [(w, t) for w, t in zip(s["Word"].values.tolist(),
                                                     s["Tag"].values.tolist())]
        self.grouped = self.data.groupby("Sentence #").apply(agg_func)
        self.sentences = [s for s in self.grouped]

    def get_next(self):
        s = self.grouped["Sentence: {}".format(self.n_sent)]
        self.n_sent += 1
        return s

df = pd.read_csv(input("Please enter preprocessed sheet filename: "))

getter = SentenceGetter(df)
sentences = getter.sentences

print("Creating data...")
X = [[w[0] for w in s] for s in sentences]
y = [[w[1] for w in s] for s in sentences]

X = np.array(X)
y = np.array(y)

(X_train, y_train, X_test, y_test) = train_test_split(X, y, test_size=.1)

print("Building model...")
model = anago.Sequence()
print("Beginning training...")
model.fit(X_train, y_train, epochs=1, verbose=1)
CharlesAverill commented 4 years ago

Solved, for some reason train_test_split breaks everything. No idea why, but that needs to be fixed.