Closed GregSilverman closed 4 years ago
Hi Greg!
I looked at this briefly and I cannot find any reason why overlap should be slower. It might have something to do with how the similarity is implemented in simstring, the underlying matching library, but I couldn't find any indication of that when I looked at its code.
Since you have the data ready, could you profile the code and check where the diff in performance is?
-Luca
Howdy Luca! Will do... it might be a bit before I get to it, but it will happen.
Grazie!
On Wed, Jun 19, 2019 at 10:00 AM Luca Soldaini notifications@github.com wrote:
Hi Greg!
I looked at this briefly and I cannot find any reason why overlap should be slower. It might have something to do with how the similarity is implemented in simstring, the underlying matching library, but I couldn't find any indication of that when I looked at its code.
Since you have the data ready, could you profile the code and check where the diff in performance is?
-Luca
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Georgetown-IR-Lab/QuickUMLS/issues/45?email_source=notifications&email_token=AAHV3OXRAE75RED263V72K3P3JCZLA5CNFSM4HUTON72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCESUA#issuecomment-503597392, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHV3OUQDJCJZSYRIZEB4U3P3JCZLANCNFSM4HUTON7Q .
-- Greg M. Silverman Senior Systems Developer NLP/IE https://healthinformatics.umn.edu/research/nlpie-group University of Minnesota gms@umn.edu
› evaluate-it.org ‹
I was curious as to why overlap similarity was so slow. Since there is really no operation other than the
intersection
going on with it and all the other metrics are just variants on this, not sure why this would be the case.This slowness has been exhibited in two different corpora (i2b2 and mipacq). I am about to test it on a 3rd internal corpus.
Any insight would be most welcome.