-
Hi,
I am interested in reproducing the results from the CIRR dataset. Could you please share the evaluation code for CIRR?
I saw the CIRR evaluation results from your paper.
Thank you.
-
Replace the current linear regressions with some of these options
1. fine-tune existing model (fine-tune the LLM weights)
2. random forest or similar on the embeddings
3. neural network on top of …
-
Kris’s advice about how to know whether a chain is big/long enough is to generate several (10?) and score them, compare the statistics for some key metrics in the individual ensembles and if they’re r…
-
I'm really impressed with the work presented in your paper. I've been trying to replicate the evaluation process for my own research, but I'm having difficulty finding the exact metrics used. Would yo…
-
I want to be able to perform post-evaluation query filtering after evaluating a model on a retrieval benchmark. In other words, after evaluation is ran I want to be able to select a subset of the test…
-
### User Story
As a challenge manager, in order to inform what evaluators should evaluate the submissions on, I would like to setup evaluation criteria.
**Acceptance criteria:**
- [x] A challenge m…
-
Hello,
I am experiencing unusually long evaluation times while running my model on an RTX 3090 GPU. Is this expected behavior, or is there something I might need to optimize? Any insights would be …
-
-
I was looking through the evaluation script and was wondering why do we perform this clipping of depth values [here](https://github.com/Tencent/DepthCrafter/blob/67eb5ee0942367274c840ed331862b113f2b5b…
-
Following my comment on issue #67, the [formal semantics](https://w3c.github.io/odrl/formal-semantics/#:~:text=4.1%20Expected%20behaviour%20of%20the%20ODRL%20Evaluator), or a new document, should prov…