Closed jararap closed 1 year ago
Thank you for your interest in Palmetto and for pointing out this bug. It seems like you are right. This seems to be a bug :bug: I will look into it.
Notes for myself:
The new version is released and deployed. Thanks again for opening the ticket and reporting the issue. :+1:
Dear Michael, thank you for this wonderful library :thumbsup:, I was investigating the issue where C_v is negatively correlated with other metrics, and realised that the hyper-parameter ɣ was set to 2 as opposed to the suggested 1 in your paper. I believe this is the cause of #76 and #12.
Examples:
T1 = fan player game playoff play score team hit win season
has a C_v(ɣ=1) score of 0.8394 compared to C_v(ɣ=2) score of 0.5629.T2 = accumulation fossil hydrocarbon methane mud oil petroleum sand sediment shell
C_v(ɣ=1): 0.7681 C_v(ɣ=2): 0.482 c_npmi: 0.1977T3 = ambitious amnesty definite desirable distraction entail entrant funding unauthorized undermine
C_v(ɣ=1): 0.3822 C_v(ɣ=2): 0.6244 c_npmi: -0.3231T2 and T3 is interesting because C_v(ɣ=1) and C_v(ɣ=2) contradict each other. As ongoing research, I am unable to show my data here, but I can confirm that correlation-wise for Palmetto, C_v(ɣ=1) is positively correlated with other metrics (except C_A because I did not test it) while C_v(ɣ=2) has a negative correlation.
Unfortunately :weary:, it seems that ɣ was set to 2 for as far back as 2016 on github. I am unsure if it is the same on the endpoint when it was up. The docker container provided also has ɣ set to 2.
Silver lining :smiley:: your paper is correct! C_v(ɣ=1) is usable and can be recommended again! I will still recommend Palmetto over gensim 👍
Cheers JP