Closed koaning closed 2 years ago
I'm also thinking of a postprocessing trick now. If a token is detected as an entity, but it is part of a noun-chunk, we may also attempt to highlight the entire noun-chunk.
This would be for a separate tutorial, but I'm curious what you think of the idea.
They are, but the n-grams do actually need to be present in the embedding model. If not, the algorithm doesn´t have any input to expand over.
I can see 2 solutions:
Additionally, there is a function to add a flag include_compound_words
, which should allow for the model to detect "big knife" based on only having an initial similarity result for "knife".
This I also one of the features that isn't properly added to the documentation.
Besides that, the exclude_pos
and exclude_dep
are too.
I generally like to compose the behaviour of the patterns along with your rule-based matcher explorer https://demos.explosion.ai/matcher.
Yeah, averaging the embeddings of inputs seems like it'll result in a bad time.
But it was indeed probably the include_compound_words
feature that was missing from my initial trial.
There is also a third option, one that (hopefully) will get announced next week on our YouTube channel.
Now you got me curious about the third option.
But cool that you are working on a tutorial. Let me know if there are any hiccups or features you might think of.
@koaning I closed this for now. Will review the solution after your blogpost.
It will be a two-part thing, the first part will be on YouTube. The thing about the solution though is that it is already implemented in another library 😉
That library being? 😅 or are you talking about the doc.noun_chunks
part?
Cool. I´ll do some testing and look into a way to integrate this.
There are likely some other integrations inbound, but yeah, s2v is a great trick.
I might be working on a tutorial on this project, so I figured I'd double-check explicitly: are multi-token phrases supported? My impression is that they're not, and that's totally fine, but I just wanted to make sure.
This example:
Yields this error: