Closed kevinrosenberg21 closed 6 years ago
Hi everyone.
Has anyone been able to check this out? Is it a bug or something I'm doing wrong?
Thanks
I've spent all afternoon looking at this and the only explanation I can come up with is that when it merges the entities it somehow doesn't re-calculate the doc's tensor, causing inconsistencies. @ines @honnibal is there a method to re-calculate it manually?
Sorry for the constant commenting, but I checked it out.
doc1 = nlp(txt)
print("Before merging entities the len of the doc is: " + str(len(doc1)))
print("Before merging entities the shape of the tensor is: " + str(doc1.tensor.shape))
doc1 = merge_entities(doc1)
print("After merging entities the len of the doc is: " + str(len(doc1)))
print("After merging entities the shape of the tensor is: " + str(doc1.tensor.shape))
And the result was
Before merging entities the len of the doc is: 697 Before merging entities the shape of the tensor is: (697, 128) After merging entities the len of the doc is: 648 After merging entities the shape of the tensor is: (697, 128)
By the comments in the code
The doc.tensor attribute holds dense feature vectors computed by the models in the pipeline. Let's say a document with 30 words has a tensor with 128 dimensions per word. doc.tensor.shape will be (30, 128). After calling doc.extend_tensor with an array of hape (30, 64), doc.tensor == (30, 192).
Hi everyone.
I was able to fix the problem by resetting the tensor to the initial value it has in the Doc class before the parser.
It now works with this code:
def load_model(model_dir):
import spacy, numpy
from utils import clean_text
nlp = spacy.load(model_dir+'/ner')
nlp.tokenizer = custom_tokenizer(nlp)
nlp_parser = spacy.load(model_dir+'/parser')
def make_doc(txt):
txt = clean_text(txt)
doc = nlp(txt)
doc = merge_entities(doc)
doc.tensor = numpy.zeros((0,), dtype='float32')
doc = custom_sbd(doc)
return doc
nlp_parser.make_doc = make_doc
return nlp_parser
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Your Environment
Info about spaCy
Hi,
As I mentioned in other posts I'm working on a dependency parser that expects to receive an entity-recognized Document.
I'm able to create the training set, and successfully train the model, but I get an error when I try to parse a new text.
I have defined functions for merging the entities, and having the document be a single sentence, available below.
I insert them in the pipeline with the function I use to load the model
This is because I couldn't create a single model, so I ended up creating two, one for NER and one for parsing, and joining them with that function. Not the most elegant solution, but it seems to be working.
When I parse the doc
I get the following error
With the following stack
If I uncomment the #nlp.disable_pipes('parser') line, it works. Similarly, if I comment out the entity merging line (doc = merge_entities(doc)) it also works. Is there something I'm doing wrong or is this a bug?
Thank you