-
Hi @lingorX @wenguanwang, @tfzhou
Thank you for the great paper.
I have a question regarding the evaluation metrics. The metrics used in this paper is the `mIoU`.
1. So this `mIoU` is calculat…
-
**Context**
When running the evaluators over larger datasets, depending on the model, it is very common to run into LLM errors where the output is not valid JSON. For example, while running the ben…
-
"To evaluate the effectiveness of our method more realistically and inspired by the evaluation method in MegaFace [21], we modified the evaluation method of LFW. We randomly selected 12 persons from t…
-
There is no clear instructions in the repo how to calculate and verify the metrics published in the paper, neither it has been calculated in training and validation step only the input and denoised im…
-
Hello,
I would be interested in seeing the evaluation code for the three metrics mentioned in the paper. I have a bit of confusion about the definitions and the formulae used, as I'm studying this pr…
-
Hi,
Thanks for your work first!
I run inference by your checkpoint and evaluate all the metrics you mentioned in the paper but I got very poor results. Could you please provide the code for eva…
-
Hey can you please elaborate how did you find recall@3. The cosine similarity between true distractors and the predicted distractors will lie between 0 and 1. Please elaborate how did you convert this…
-
Hey,
I ran the inference on the 29 Huang annotated sequences from DAVIS 2017.
```
srun python video_completion.py \
--mode object_removal \
--seamless \
--path ../data/…
-
![image](https://github.com/THU-KEG/KoLA/assets/8592144/017a3423-14c7-4f91-9232-8e909268edbd)
Are you evaluating F1 or EM (ROUGE or BLEU) after all for these datasets? I have no idea reading this pap…
-
**Evaluation of Per Taxonomy Leaf Performance**
- Performance evaluation on each taxonomy leaf should be performed as each leaf node in the taxonomy represents one particular skill, or set of k…