-
After finishing install successfully, i got this error when ran this command: python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --num-gpus 1
---=== Running Step 1 ===…
-
I am getting the following error traceback when I run `python -m torch.distributed.launch --nproc_per_node=1 reward_summarization.py --bf16` on a machine with two nodes of A10 (24GB). I have `torch==2…
-
https://hackmd.io/YfYRpWJXQGSIzqx8v_1WFA?both#Chain-adoptionallocation
Parameters: `max_chains`, `mu`, `sigmasq`
Derived metric: `ServicerRewardPerChain`: Servicer's previous reward divided by num…
-
**Describe the bug**
After installing the python libraries and run `bash ./scripts/run_raft_align.sh`. The following content is reported:
* 'validate_all' has been renamed to 'validate_default'
…
-
### Describe the bug
在基于bloomz-560m模型训练rm模型,观察到训练过程中仍然是1块gpu在训练;
![image](https://github.com/shibing624/MedicalGPT/assets/26675984/2cd7eb8d-01bd-4d03-9438-4e78bf49e7a2)
### To Reproduce
训练脚本如下:
…
-
We need a definition to code our training set for CFIR Inner Context.
After reviewing the manual that was attached to a prior LUCID meeting note I do not see a definition for "inner context" per se…
-
I have successfully written a custom environment in the gymnasium and used it in CDT successfully,Here's the environment I created:
![image](https://github.com/liuzuxin/OSRL/assets/111236370/37180e9d…
-
Hi, I see the https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts and found it a good RLHF tutorial. However, there are some steps I can't figure out.
The first step is "Supervise…
-
**Actor model**: Bloom-1.1b
**Reward model**: Bloom-560m
**Finetuning cmd**:
bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_superv…
-
### Proposal
While the current lobby has no issues, it is lacking two things. Engagement, Life, and Space. So why not redo it? We could redo the lobby to include a lot more open space, environment, l…