PatentsView / PatentsView-Disambiguation

30 stars 11 forks source link

assignee disambiguation: incorporate location in the measure of similarity #4

Closed Markhzz closed 3 years ago

Markhzz commented 3 years ago

Hi Monath,

Sorry to bother you again!!

I was trying to learn from your program. I checked your presentation slides at the USPTO Symposium, where you mentioined that "The assignee model is based on a tf-idf character n-gram string similarity model that uses data from PermID."

Just to confirm, the program uses the location and name spelling similarity to compute the similarity, right? I make that inference because the program encodes three features, where the locations and name_tfidf are used for computing the similarity, and entity_kb_feat is used as constraint.

    triples = [(locations, FeatCalc.DOT, CentroidType.NORMED, False, False),
               (entity_kb_feat, FeatCalc.NO_MATCH, CentroidType.BINARY, False, True),
               (name_tfidf, FeatCalc.DOT, CentroidType.NORMED, False, False)]

Thank you so much!! I'm sorry for bothering you again!

Best, Mark

nmonath commented 3 years ago

Hi Mark, Sincere apologies for the delay again! Sorry for the confusion. While location is specified here, I believe it is no longer added to the AssigneeNameMention object the https://github.com/PatentsView/PatentsView-Disambiguation/blob/dev/docs/pv/disambiguation/core.py#L312. It should have been removed from the code you mention above (and will be). Thanks for your question!

Best, Nick

Markhzz commented 3 years ago

Hi Monath,

Thank you so much!!