explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.16k stars 4.4k forks source link

is LSTM neural network in implementation of nlp.update() function in below NER training code? #1864

Closed prashant334 closed 6 years ago

prashant334 commented 6 years ago

@ines @honnibal

Python version: 2.7.6
Platform: Linux-3.16.0-77-generic-x86_64-with-Ubuntu-14.04-trusty
spaCy version: 2.0.0a17
Models: en, en_core_web_sm, xx_ent_wiki_sm

def train_ner(nlp, train_data, output_dir): random.seed(0) optimizer = nlp.begin_training(lambda: []) nlp.meta['name'] = 'CRIME_LOCATION' for itn in range(50): losses = {} for batch in minibatch(get_gold_parses(nlp.make_doc, train_data), size=3): docs, golds = zip(*batch) nlp.update(docs, golds, losses=losses, sgd=optimizer, drop=0.35) print("under learning") if not output_dir: return

cmward commented 6 years ago

If you're asking whether or not the NER model uses an LSTM, I believe the answer is no. The model is similar to the one described here as far as I know: https://explosion.ai/blog/deep-learning-formula-nlp

prashant334 commented 6 years ago

@cmward In this blog it self it is mentioned in Encode section , Given a sequence of word vectors, the encode step computes a representation that I'll call a sentence matrix, where each row represents the meaning of each token in the context of the rest of the sentence.

The technology used for this purpose is a bidirectional RNN. Both LSTM and GRU architectures have been shown to work well for this.

honnibal commented 6 years ago

I recorded a talk about the NER architecture here: https://www.youtube.com/watch?v=sqDHBH9IjRU

The short answer to your question is that the NER currently uses 4 CNN layers with residual connections and the maxout activation function to perform the "encode" step. You could alternatively use a BiLSTM instead. However, there are a few reasons why I think a CNN makes more sense here.

A receptive field of 4 tokens should be sufficient for the "encode" step in the NER model. I think this is at the limit of what will generalise well from the training data to a wide variety of problems. If you have training data of exactly the same type of documents as during testing, conditioning on a whole document might be useful. But for spaCy, we want a single model that works generally well on detecting entities such as persons and organisations. The only way to get that generalisation is to make the features quite local.

BiLSTM models also perform very well at detecting local features, especially if the BPTT is truncated to only use the gradient from the most recent few words. However, if we know we only want local features, we may as well use a CNN. It's faster, and it bakes the assumptions we want into the architecture, instead of forcing us to find a solution we like through hyper-parameter tuning.

prashant334 commented 6 years ago

@honnibal any way to track OR GENERATE LOG FILE layer by layer output during nlp.update() call in train NER?

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.