-
I am starting this thread for feature spec discussion for umBRELA @lintool @ronakice.
Suggestions from my side:
- parameter for specifying the number of samples for inference and later performin…
-
**What problem or use case are you trying to solve?**
Currently OpenDevin somewhat works with the strongest closed LLMs such as GPT-4 or Claude Opus, but we have not confirmed good results with ope…
-
Firstly, I wanted to say the paper was a great read. Thank you for the excellent work!
I noticed that Table 4 reports the ChartQA performance on InternVL2-8B as 73.80, whereas the [InternVL](https:…
-
Hello, thank you for your great work. @mczhuge
#First question:
In your paper's fifth page, there is a sentence: "we record key decisions and actions made by the agentic systems through some custom …
-
### Willingness to contribute
No. I cannot contribute this feature at this time.
### Proposal Summary
When working with Mlflow Evaluation or AI agents, there are "hidden" system prompts that are no…
-
> regarding the Chat GPT-like features, thats pretty far out on my road map currently, I want to add support for smart devices, wearables and home medical devices first.
The other thing that makes t…
-
-
- [ ] [LLM API Host Leaderboard | Artificial Analysis](https://artificialanalysis.ai/leaderboards/hosts?parallel_queries=single&prompt_length=long)
# LLM API Host Leaderboard | Artificial Analysis
…
-
I am getting error "Failed to parse output. Returning None" on faithfulness metric for some inputs. This is inconsistent behavior as it is haphazard and sometimes works, sometimes doesn't for the same…
-
Hellow, thankyou