-
### Proposal summary
See in the UI the traces that are produced when using LLM as Judges, both General and LLM spans.
### Motivation
When using LLM as Judged metrics in your Evaluations, it is usef…
-
Dear AI Engineer, how can I calculate evaluation metrics such as Dice, IOU, and HD? Could you provide me with some code? I found some code online regarding this topic, but I encountered some issues wh…
-
Hi Kevin,
Thanks for your contribution to the ABSA task. I just wanted to bring your attention to the following code block in your utils.py file within the InstructABSA folder. Seems that because e…
-
still getting this error
command: kurtosis run --enclave test-net github.com/ethpandaops/ethereum-package --args-file network_param.yml
There was an error interpreting Starlark code
Evaluation…
-
As the picture shows below, the lm-evaluation-harness computes the metrics of sub-tasks (such as leaderboard_gpqa_diamond/extended/main), how can i get the metrics of leaderboard_gpqa?
![issue](https…
-
Dear Author,
I have encountered an issue while replicating your highlight task using the shiq dataset. I noticed that the fid metric matches, but there is a significant discrepancy in the kid metri…
-
I want to compare the estimated pose with the ground truth pose using ov_eval. The dataset that I am using is the EuRoC dataset. I am recording a rosbag of the estimated pose data (/ov_msckf/poseimu) …
-
Does the code not include evaluation metrics? Could you provide the relevant code for metrics?
-
We got following issue while running your project for MSU AI club Campus VIsion Challenge:
Custom code for evaluation:
```python
import os
import sys
import matplotlib.pyplot as plt
impor…
-
Hi, thanks for the interesting work. We ran Dino's last embedding and compare similarity scores for the original AnimateDiff generated videos and found the similarities are in fact very high. Is it po…