reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeedExamples #279

RuntimeError: Step 1 exited with non-zero status 1

After finishing install successfully, i got this error when ran this command: python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --num-gpus 1 ---=== Running Step 1 ===…

yudonglee updated 3 months ago
30
huggingface/trl #274

RuntimeError: one of the variables needed for gradient compu…

I am getting the following error traceback when I run `python -m torch.distributed.launch --nproc_per_node=1 reward_summarization.py --bf16` on a machine with two nodes of A10 (24GB). I have `torch==2…

oroojlooy updated 11 months ago
19
BlockScience/PocketSimulationModel #4

Default Servicer Chain Adoption

https://hackmd.io/YfYRpWJXQGSIzqx8v_1WFA?both#Chain-adoptionallocation Parameters: `max_chains`, `mu`, `sigmasq` Derived metric: `ServicerRewardPerChain`: Servicer's previous reward divided by num…

jshorish updated 11 months ago
5
OptimalScale/LMFlow #544

[BUG]pydantic.errors.PydanticUserError when running RAFT

**Describe the bug** After installing the python libraries and run `bash ./scripts/run_raft_align.sh`. The following content is reported: * 'validate_all' has been renamed to 'validate_default' …

Zhang-Each updated 1 year ago
8
shibing624/MedicalGPT #49

rm模型训练过程

### Describe the bug 在基于bloomz-560m模型训练rm模型，观察到训练过程中仍然是1块gpu在训练； ![image](https://github.com/shibing624/MedicalGPT/assets/26675984/2cd7eb8d-01bd-4d03-9438-4e78bf49e7a2) ### To Reproduce 训练脚本如下： …

Vincent131499 updated 1 year ago
4
lzim/teampsd #323

backlog discussion - Abstract & Title - CFIR Inner Context C…

We need a definition to code our training set for CFIR Inner Context. After reviewing the manual that was attached to a prior LUCID meeting note I do not see a definition for "inner context" per se…

teampsdkathryn updated 7 months ago
16
liuzuxin/OSRL #20

The parameters in cdt_configs.py

I have successfully written a custom environment in the gymnasium and used it in CDT successfully,Here's the environment I created： ![image](https://github.com/liuzuxin/OSRL/assets/111236370/37180e9d…

ZhengyeHan updated 1 year ago
6
huggingface/trl #492

Questions about the Stack-LLaMa

Hi, I see the https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts and found it a good RLHF tutorial. However, there are some steps I can't figure out. The first step is "Supervise…

beyondguo updated 1 year ago
11
microsoft/DeepSpeedExamples #451

Finetuning Bloom model in step 3 failed

**Actor model**: Bloom-1.1b **Reward model**: Bloom-560m **Finetuning cmd**: bash training_scripts/single_node/run_bloom_1.1b.sh /DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_superv…

cokuehuang updated 1 year ago
12
monke-mob/monke-activities #29

How about we redo lobby, adding a village with side quests?

### Proposal While the current lobby has no issues, it is lacking two things. Engagement, Life, and Space. So why not redo it? We could redo the lobby to include a lot more open space, environment, l…

homiemace updated 1 year ago
5

上一页 1...34 35 36 37 38 39 40...64 下一页

639 results for reward-modeling

639 results
for reward-modeling