-
- [ ] [Challenges in Evaluating Agent Performance: A Critical Analysis](https://arxiv.org/html/2404.11584v1)
# Challenges in Evaluating Agent Performance: A Critical Analysis
## Snippet
"6.2 Challen…
-
We're trying to reference a private CodeQL pack when running `init`. When this runs in Github actions, we get `HttpError: Not Found`.
```
name: "SAST Scans"
on:
push:
branches: ["maste…
-
Training gets stuck after first epoch, while the eval metrics are being calculated
-
```
What steps will reproduce the problem?
import os, gc
import PyV8
def get_mem():
a = os.popen('ps -p %d -o %s | tail -1' % (os.getpid(),"vsize,rss,pcpu")).read()
a = a.split()
return (…
-
```
What steps will reproduce the problem?
import os, gc
import PyV8
def get_mem():
a = os.popen('ps -p %d -o %s | tail -1' % (os.getpid(),"vsize,rss,pcpu")).read()
a = a.split()
return (…
-
```
What steps will reproduce the problem?
import os, gc
import PyV8
def get_mem():
a = os.popen('ps -p %d -o %s | tail -1' % (os.getpid(),"vsize,rss,pcpu")).read()
a = a.split()
return (…
-
```
What steps will reproduce the problem?
import os, gc
import PyV8
def get_mem():
a = os.popen('ps -p %d -o %s | tail -1' % (os.getpid(),"vsize,rss,pcpu")).read()
a = a.split()
return (…
-
Evaluating now causes the selected code to be evaluated twice.
This happens whether the cursor is within the code being evaluated or whether the code is highlighted. It runs from top to bottom of the…
-
(From my comment on Slack regarding 0.16.0, split into a few issues)
* Evaluating something prints an unwanted empty line in the REPL each time.
-
Evaluating on MS-MARCO seems to take significantly a lot more time than NQ or Hotpot QA, i.e., it just hangs there:
> Loading checkpoint shards: 0%| | 0/2 [00:00