reward-modeling Search Results

609 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

nebuly-ai/optimate #284

[ChatLlama] Error in the start of OPT1.3B actor pre-training

Hello, I am trying to pre-train the actor model but around the 815-816th example, the training stops and shows this very long error message. I had already trained the reward model so I have been using…

swang99 updated 1 year ago
1
h2oai/h2ogpt #308

bitsandbytes issue

Hi, Followed the instal as per for Windows and it runs fine without "--load_8bit=True" Trying to get it to run with "--load_8bit=True" following the extra instructions as:- pip uninstall bits…

adryan-ai updated 1 year ago
4
Unity-Technologies/ml-agents #4211

Human-in-the-loop and/or Reward modeling

**Is your feature request related to a problem? Please describe.** Defining a reward function may be complex or just impossible in some cases (ie: an agent making a back-flip or a natural walk) or, i…

DrTtnk updated 1 year ago
5
alexpiet/licking_behavior #195

Literature Notes

- Behavioral Strategy Determines Frontal or Posterior Location of Short-Term Memory in Neocortex - https://pubmed.ncbi.nlm.nih.gov/30100254/ - Mixture of Learning Strategies Underlies Rodent Beha…

alexpiet updated 1 year ago
13
dotnet/msbuild #613

Please Consider Improving Project Format and Structure (Seri…

[Going to try this again](https://github.com/Microsoft/msbuild/issues/16), but hopefully with a better defined ask. The request is not to support a particular format, but to improve the MSBuild syste…

Mike-E-angelo updated 7 months ago
195
huggingface/trl #325

Stack-llama rl_training script: CUDA Index error

Trying to run stack-llama [rl_training script](https://github.com/lvwerra/trl/blob/main/examples/stack_llama/scripts/rl_training.py) with the following [reward model](https://huggingface.co/kashif/lla…

jeromeku updated 1 year ago
8
huggingface/transformers #20320

Loading model OOMs with more GPUS

### System Info - `transformers` version: 4.21.2 - Platform: Linux-5.10.135-122.509.amzn2.x86_64-x86_64-with-glibc2.2.5 - Python version: 3.8.5 - Huggingface_hub version: 0.10.0 - PyTorch versi…

Dahoas updated 1 year ago
6
h2oai/h2ogpt #48

Recover when GPU OOMs

``` torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 22.20 GiB total capacity; 20.67 GiB already allocated; 4.12 MiB free; 21.14 GiB reserved in total by PyTorch) I…

pseudotensor updated 1 year ago
2
filecoin-project/devgrants #953

Next Step Microgrant: We are FlowModel in Chainlink Spring 2…

### 1. What is your project? (max 100 words) (Our project is called FlowModel during Chainlink Spring 2022 Hackathon, which is renamed as BlockModel now.) BlockModel is a R&D infrastructure …

jasonplato updated 1 year ago
2
hpcaitech/ColossalAI #2751

22919MiB*4 计算资源情况下，torchrun --standalone --nproc_per_node 4 …

### 🐛 Describe the bug 相关日志： WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid you…

ct1976 updated 1 year ago
2

上一页 1...45 46 47 48 49 50 51...61 下一页

609 results for reward-modeling

609 results
for reward-modeling