reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

shibing624/MedicalGPT #122

在用 chatglm2-6b 构建奖励模型时候报错 ValueError: Found array with dim 3…

1. transformers 版本 4.31.0 最新版本； 2. 已经修改的地方有 ``` --- a/reward_modeling.py +++ b/reward_modeling.py @@ -34,6 +34,7 @@ from transformers import ( Trainer, TrainingArguments, set_s…

diaojunxian updated 1 year ago
3
confio/tgrade-contracts #9

tg4-mixer: Configure "mixing function" for PoE

Builds on #8 * Use the actual sigmoidal curve defined in the PoE whitepaper. * Optimize implementation of updates (some benchmarks counting storage access) If this is too hard to implement th…

ethanfrey updated 7 months ago
20
VNNikolaidis/nnlib2Rcpp #16

Checking the codebook vectors in trained vector quantization…

Using version 0.2.4 of `nnlib2Rcpp`, I am trying to identify the codebook vectors after training the *LVQ. Taking the `sLVQ` example in Manual, the `show` function returns this portion: ``` ID: …

drag05 updated 8 months ago
21
shibing624/MedicalGPT #73

使用deepspeed run run_rm.sh

Could not estimate the number of tokens of the input, floating-point operations will not be computed Traceback (most recent call last): File "/root/nas-share/chat/MedicalGPT-main/reward_modeling.p…

yangzhipeng1108 updated 1 year ago
4
eric-mitchell/direct-preference-optimization #32

why do prob_eval(train)/chosen and rewards_eval(train)/chose…

Have reviewed the wandb training curves provided, and I have a question: why do prob_eval(train)/chosen and rewards_eval(train)/chosen gradually decrease? I originally thought that these two metrics w…

bowuaaa updated 1 year ago
6
shibing624/MedicalGPT #35

Stage 3: Reward Modeling 报错：**ValueError: weight is on the m…

### Describe the Question 按照run_training_pipeline.ipynb的步骤执行， Stage1,Stage2都执行OK，执行到第三阶段：RM(Reward Model)奖励模型建模时，报错，请帮忙解决。错误：**ValueError: weight is on the meta device, we need a `value` to put …

dage0127 updated 1 year ago
4
pyrddlgym-project/pyRDDLGym-symbolic #2

Potential DBN inconsistencies

When I played around with the navigation domain (I appended the `domain.rddl` and `instance.rddl`) and displayed the DBN of single states and ground fluents, I could not interpret the results correctl…

GMMDMDIDEMS updated 8 months ago
10
h2oai/h2ogpt #1274

Mixtral8x7 on A6000 48GB with RAG

Was able to load mistralai/Mixtral-8x7B-Instruct-v0.1 using the --load_4bit=True quantization, using about 30GB VRAM. Loading an xlsx file containing the data I want (are just 220 cells with some te…

davide445 updated 10 months ago
7
rust-embedded/wg #101

[WIP] pre-RFC: Better management of SVD to Chip Support Crat…

- Feature Name: Central Management of SVD to CSP process - Start Date: 2018-05-xx - RFC PR: - Rust Issue: # Summary [summary]: #summary I need to make this a paragraph, but my main points a…

jamesmunns updated 5 months ago
25
galaxyproject/training-material #989

Tutorial Feedback

This issue will collect all feedback submitted via the feedback form at the end of each tutorial ---- Results have been [aggregated](https://nbviewer.jupyter.org/github/bebatut/galaxy-training-m…

shiltemann updated 12 months ago
2500

上一页 1...36 37 38 39 40 41 42...64 下一页

639 results for reward-modeling

639 results
for reward-modeling