-
Hello, peng!
Sorry to disturb you.
In the paper you use F1 score to evaluate the model. I know the formula for calculating the F1 score. But what I get is the coordinates and score of the bounding …
-
https://github.com/vered1986/OKR/blob/master/src/baseline_system/eval_predicate_mention.py#L35
We shouldn't be using average of precision and recall for computing F1 but the harmonic mean.
Can we …
-
Hi vg team,
We used ONT data from 100 individuals (20X coverage per individual) for SV calling using the Sniffles2 software. Subsequently, we used vg construct to create a graph. Out of these 100 i…
-
I wanted to reproduce the F1-Score, Precision and Recall for Entity Linking, but I can't find the code for evaluating it.
I also looked at the evaluate_task.ipynb but can't find it there.
-
Hi all,
Thanks for the wonderful work.
I am currently running code_bert_score to evaluate the similarity between generated code and 'correct' code. However, it just takes way too long locally. …
-
Hello, based on the code you provided, we conducted experiments on the DBLP dataset and achieved macro F1 scores of 0.9114±0.0062, 0.9162±0.0038, and 0.9203±0.0045, as well as micro F1 scores of 0.917…
-
感谢作者分享,我使用您实现的模型在我自己的数据集上训练,并且使用您提供的评估指令,评估了模型。我得到的表格在下面。请问F1 score和iou值完全相同,这是否是正常的
![Screenshot from 2022-11-22 19-20-33](https://user-images.githubusercontent.com/116441063/231481778-e1a5c3b…
-
If we are running classification tasks with LLM, how can we calculate overall precision, recall and F1 score from the evals?
It is not clear if derived metrics allow us to do that. Any suggestions?…
-
## Original Task
Citing from the original course task:
> Training a strong Hebrew Sentence Encoder from a pretrained Decoder While recent years
have brought many additions to the open-source set …
-
I have two questions about the pre-processed training data of NQ data.
How is it possible for 'has_gold_answer' to be False when 'em' is 1 and 'f1' is 1.0?
What criteria were used to select 'positiv…