-
Thank you for your great evaluation work, could you evaluate our latest model?(co-training VQA and chat data)
https://github.com/THUDM/CogVLM
-
Feature Description
I would like to add a model which is able to predict mobile prices properly
Use Case
Mobile price prediction is useful because it helps consumers decide the best time to buy a…
-
Thanks for your great work!
Would you mind sharing the code to train the evaluation model interclip? I'm investigating its performance and your help would be appreciated.
lzhyu updated
2 months ago
-
I am running llama2 model in wikitext dataset. I just want try some other metrics so I modify the default YAML file(`lm-evaluation-harness/lm_eval/tasks/wikitext/wikitext.yaml`) to the following, just…
-
### Description:
Implement metrics to evaluate the performance and accuracy of the CNN model used for detecting and predicting 16-segment displays.
### Tasks:
- Implement confusion matrix, prec…
Dv04 updated
10 months ago
-
This is the result of my evaluation on the app aplit model。
![image](https://github.com/user-attachments/assets/cf884c5f-a914-4147-9705-086d21fd343b)
It is lower than the indicator in the paper。What…
-
I ran this command:
`vbench evaluate --videos_path "/home/notebook/code/group/hkx/video_tasks/dover/DOVER/demo" --dimension "motion_smoothness"`
then it worked like this below:
`args: Namespace…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Topic
A lot of user is curious about how fast those embeddings are and how it can be improved.
We should ad…
-
Excuse me,
I have a problem why the best threshold was used in evaluation rather than a fixed threshold?
Shouldn't we be using a fixed threshold in practical application?
-
Currently, there is a need for an automated evaluation tool that can simplify the process. This tool should be capable of assessing the accuracy and quality of translations produced by various models.…