-
### 🚀 Feature
Some reinforcement problems, like [safe reinforcement learning](https://github.com/PKU-Alignment/safety-gymnasium/tree/main), require the environment to return multiple reward-like valu…
-
- This issue focuses on the technical courses we take about LLM, we'll put the paper part in
https://github.com/xp1632/DFKI_working_log/issues/70
---
1. **ChainForge** https://chainforge.ai/ …
-
### Proposal
Only Frozen Lake in the toy text grid world environments implements transitional probabilities.
Taxi is supposed to have it based on the previous documentation but has never been imp…
-
## What's the problem?
Wordplay, as a programming language and sharing platform, does little to help teachers structure, plan, and teach computing. Here are a small --- and likely incomplete --- li…
-
- [ ] [Finetuning LLMs for ReAct. Unleashing the power of finetuning to… | by Pranav Jadhav | Feb, 2024 | Towards AI](https://pub.towardsai.net/finetuning-llms-for-react-9ab291d84ddc)
# Finetuning L…
-
Let's create a higher bandwidth compressed representation of our communication using a new language that we invent on the Fly that is emergent using emojis and text and mathematical symbols in a f…
-
🔍 **Deep Dive: "ReAct: Reasoning and Acting with Large Language Models"**
📝 **Summary:**
The authors introduce "ReAct", a method that synergizes reasoning and acting in large language models (LL…
-
### Discussed in https://github.com/brianpetro/obsidian-smart-connections/discussions/416
Originally posted by **Levani307** January 16, 2024
Hello, I am a Smart Connection Supporter and I lov…
-
Theory in Phase 1
-
### Willingness to contribute
No. I cannot contribute this feature at this time.
### Proposal Summary
Set back the 'Only show diff' from the MLFlow 1 version.
### Motivation
> #### What is the us…