evaluations Search Results

isotope/core #2536

Evaluations & statistics

Isotope Version 2.9.0 For sales statistics ![image](https://github.com/user-attachments/assets/a1b980a3-94cb-415a-a657-d8be67018436) but at Sales summary ![image](https://github.com/user-attac…

katgirl updated 5 days ago

FSoft-AI4Code/XMainframe #3

Evaluations and Prompts

Could you please share the evaluation scripts and prompts that were used to generate the reported results in the paper? Various parameters are involved in generating outputs, and it is crucial to …

prince14322 updated 2 weeks ago

google-deepmind/gemma #42

Reproducing evaluations

Trying to reproduce evaluation numbers but not able to. Ex : For gemma-2-9b, the technical report mentions 68.2 on BBH 3 shot CoT while the open llm [leaderboard](https://huggingface.co/spaces/ope…

adiprasad updated 1 month ago

AllenNeuralDynamics/aind-data-schema #1062

Quality control needs versioning for evaluations

dbirman updated 11 hours ago

rese1f/MovieChat #78

Evaluations without human blind evaluation

Hi, have you evaluated the model using only GPT3.5/Claude without HBR? This is important for the research community to compare against your work.

ADiko1997 updated 1 week ago

ParadoxZW/LLaVA-UHD-Better #7

Reproduce Llava-uhd Evaluations?

Hello! Thanks so much for fixing the bugs in the Llava-uhd repo. I was wondering if anyone was able to reproduce the evaluations that Llava-uhd got in their paper. I also noticed that my pretraining c…

Andrew-Zhang updated 3 weeks ago

RTIInternational/teehr #247

Using `teehr` to support event-based evaluations

Here's an example of a single site evaluation of rainfall-driven runoff events: https://github.com/jarq6c/little_hope/blob/main/teehr-events/single_site.ipynb The goal of this evaluation was to iso…

jarq6c updated 13 hours ago

twosixlabs/armory-library #169

Add informational logging to evaluations

Armory provides little information to the console while executing evaluations. Adding `INFO` level logging at major step such as model and dataset loading, as well as chain evaluation would make it ea…

deprit updated 3 weeks ago

EngreitzLab/gene_network_evaluation #26

Trait enrichment results should be processed in evaluations

To generate a PheWAS plot, we need to run https://github.com/EngreitzLab/gene_network_evaluation/blob/main/src/plotting/plot_gwas_enrichment.py#L10. To avoid extra computation in the dashboard, we …

adamklie updated 2 days ago

defenseunicorns/leapfrogai #823

ADR: RAG Evaluations Framework

This ADR will encompass all of the lessons learned and decisions made on how we will handle RAG-focused LLM evaluations. These issues are encompassed in the [RAG Evaluations MVP Epic](https://githu…

jalling97 updated 1 month ago

1000+ results for evaluations

1000+ results
for evaluations