-
**What problem or use case are you trying to solve?**
On evaluation, such as SWE-bench, we are currently logging the instruction for each individual repo, but not actually logging the queries sent …
-
> Rather than asking an LLM for a direct evaluation (via giving a score), try giving it a reference and asking for a comparison. This helps with reducing noise.
-
We need to start dividing up the work, authoring the various Threats / Controls. We'll use this issue to manage that work and their assignments.
Each threat / control is 'ticked' when assigned to …
-
Sometimes , ds tasks need trained data from web page , or generated by llm.
data-interpreter should determine to get useful trained data from webpage,
or generate useful data by it self.
I mean i…
-
It seems like an important aspect of what we're interested in has to do with time.
- Things happen before other things that have a causal effect. Does the LLM understand this?
- When you give it a…
-
After getting a score of 0 every time, I looked at the samples.jsonl_results.jsonl file and the result for each is this: "failed: module 'signal' has no attribute 'setitimer'"
This seems like a Wi…
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the ex…
-
### What is the issue?
The quality of the results returned by the embedding model now is much worse than the previous version.
### OS
Linux
### GPU
Nvidia
### CPU
Intel
### Ollama version
0.1…
-
In the paper in appendix B.2, you briefly describe how you generate the malicious instructions dataset. Could you share the prompt and seed instructions you used to generate this dataset? And how did …
-
### Issue: Comparing GROBID and Docling for Parsing Scholarly Publications
#### **My Use Case**
We need to parse and extract all relevant information from (1000s) of scholarly publications, such…