-
-
Completely no idea what is wrong, check the reward and Q function graph. Sometimes you stumble upon a functional agent that moves well or seem to chase the ball, but it is highly unstable.
https:/…
-
Hi @danielhanchen
I am trying to fine-tune gemma2-2b for my task following the guidelines of the continued finetuning in unsloth. Howver, I am facing OOM while doing so. My intent is to train gemm…
-
Thank you very much for your outstanding work.
I have a few small questions that I want to confirm with you.
Firstly, in the `my_highway_env.py` file,
`vehicle = self.action_type.vehicle_class`…
-
The project aims to develop a reinforcement learning (RL) agent to optimize waste collection in a simulated environment, minimizing overflow events and improving efficiency.
Environment and State R…
-
Traceback (most recent call last):
File "main.py", line 136, in
main()
File "main.py", line 132, in main
atari_learn(env, task.env_id, num_timesteps=task.max_timesteps, double_dqn=dou…
-
在用qlora在两张32GbV100上微调Llama-3___2-3B-Instruct时最后保存模型的时候报错
slurm脚本为
```#!/bin/bash
#SBATCH --job-name=openrlhf
#SBATCH --partition=gpu_v100
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH…
-
# Abstract
This project implements the NeurIPS 2019 paper:
q-means: A quantum algorithm for unsupervised machine learning
https://papers.nips.cc/paper/8667-q-means-a-quantum-algorithm-for-unsupervi…
-
CUDA_VISIBLE_DEVICES=0 python /home/ubuntu/TextToSQL/DB-GPT-Hub/src/dbgpt-hub-sql/dbgpt_hub_sql/train/sft_train.py\
--model_name_or_path /home/ubuntu/.cache/modelscope/hub/qwen/Qwen2___5-Coder-7B…
-
By changing the default agent class in the various `default_files`, we can "make" more people use better RL-agents than QLearning