Closed prashant334 closed 6 years ago
If you're asking whether or not the NER model uses an LSTM, I believe the answer is no. The model is similar to the one described here as far as I know: https://explosion.ai/blog/deep-learning-formula-nlp
@cmward In this blog it self it is mentioned in Encode section , Given a sequence of word vectors, the encode step computes a representation that I'll call a sentence matrix, where each row represents the meaning of each token in the context of the rest of the sentence.
The technology used for this purpose is a bidirectional RNN. Both LSTM and GRU architectures have been shown to work well for this.
I recorded a talk about the NER architecture here: https://www.youtube.com/watch?v=sqDHBH9IjRU
The short answer to your question is that the NER currently uses 4 CNN layers with residual connections and the maxout activation function to perform the "encode" step. You could alternatively use a BiLSTM instead. However, there are a few reasons why I think a CNN makes more sense here.
A receptive field of 4 tokens should be sufficient for the "encode" step in the NER model. I think this is at the limit of what will generalise well from the training data to a wide variety of problems. If you have training data of exactly the same type of documents as during testing, conditioning on a whole document might be useful. But for spaCy, we want a single model that works generally well on detecting entities such as persons and organisations. The only way to get that generalisation is to make the features quite local.
BiLSTM models also perform very well at detecting local features, especially if the BPTT is truncated to only use the gradient from the most recent few words. However, if we know we only want local features, we may as well use a CNN. It's faster, and it bakes the assumptions we want into the architecture, instead of forcing us to find a solution we like through hyper-parameter tuning.
@honnibal any way to track OR GENERATE LOG FILE layer by layer output during nlp.update() call in train NER?
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
@ines @honnibal
def train_ner(nlp, train_data, output_dir): random.seed(0) optimizer = nlp.begin_training(lambda: []) nlp.meta['name'] = 'CRIME_LOCATION' for itn in range(50): losses = {} for batch in minibatch(get_gold_parses(nlp.make_doc, train_data), size=3): docs, golds = zip(*batch) nlp.update(docs, golds, losses=losses, sgd=optimizer, drop=0.35) print("under learning") if not output_dir: return