Closed alejandrojcastaneira closed 5 years ago
I believe I found the cause of this, the thing it's that doing experiments with custom dev and test data, I created synthetic examples all with the cosine similarity score set to the maximum, so the standard deviations on the dev set and test set was 0.
Then the formula for Pearson's product-moment correlation coefficient divides the covariance of the results from training with the gold standard results on the dev and test sets by the product of their standard deviations.
Since my dev set and test set had zero variance, its standard deviation is also zero. That's why I got the true_divide error, then just introducing a small variance or using other data solved the problem. Sorry for my miss appreciation, this a Great library! Best regards
@alejandrojcastaneira Thank you for your analysis. Having the same problem and your explanation saves me a lot of time investigating I think.
same problem. Accurate explanation. Thanks
@alejandrojcastaneira Thank you for your explanation.
I have exactly the same problem. I get a /scipy/stats/stats.py:3508: PearsonRConstantInputWarning: An inp ut array is constant; the correlation coefficent is not defined.
which results in a nan
value for the Cosine-Similarity Pearson and Spearman. I am using the opusparcus german train and dev set to continue the training on an nli trained german model. I tried to understand your answer but my maths skills are not good enough to get your answer. Can you give me an advice what to do to get rid of the nan
value.
Yep! Thanks a lot @alejandrojcastaneira. In my case I've used uniform distribution to solve this problem.
label = np.random.uniform(low=0.8, high=1.0)
input_example = InputExample(texts=texts, label=label)
Hi, I'm training a STS using this code but over my domain data:
I'm getting these warnings:
Then the similarities on all the epochs are computed:
The examples with the STSbenchmark works Very Good! I'm just changing the train, dev, set files, I couldn't train on this data, could be associated with the vocabulary of the word embeddings? maybe that it could't contains some words from my corpus.
Bes regards