allenai RL4LMs issues - Githubissues

allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences

https://rl4lms.apps.allenai.org/

Apache License 2.0

2.13k stars 191 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Question about the classifier used for IntentAccuracyDailyDialog.

#71 zhangjf-nlp opened 2 months ago
0
Migrate to current version of gymnasium, SB3, and other libraries.

#70 Kripner opened 4 months ago
0
Upgrade to torch 2.0

#69 agastyaseth opened 4 months ago
1
how to stop env parallel multi-process to debug env.step()?

#68 invoker-LL opened 6 months ago
0
Trying to use rl4lm with more recent libraries

#67 JosefSlavicek closed 6 months ago
0
Is PPO really better than SFT (in general)? under the condition of same amount of data

#66 allanj opened 7 months ago
1
Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?

#65 missflash opened 10 months ago
0
Pip install error with gym and torch

#64 BaleChen opened 11 months ago
3
NLPO Code Error and Query About gymnasium vs gym Usage

#63 jinyilun718 opened 12 months ago
0
Reproducing existing results on NarrativeQA

#62 yxk23 opened 12 months ago
0
Memory issue in metric evals?

#61 AnujMahajanOxf opened 1 year ago
0
is multi-dimensional reward supported?

#60 zabir-nabil opened 1 year ago
0
CPU Support Minor Bug

#59 tedmoskovitz opened 1 year ago
0
Fix IndexError when loading checkpoints

#58 Runingtime opened 1 year ago
0
model.generate.scores returning two scores

#57 debjitpaul opened 1 year ago
0
'GPT2Model' object has no attribute 'first_device'

#56 Stephanehk opened 1 year ago
0
Using GPT-2

#55 oroojlooy opened 1 year ago
0
How can I inference data with the model after PPO training?

#54 RyanYip-Kat opened 1 year ago
0
Bug while loading t5 base model

#53 Sahajtomar opened 1 year ago
1
Error with Accelerate integration + NLPO

#52 avacaondata opened 1 year ago
1
[Question] End-to-end example

#51 farrokhsiar opened 1 year ago
0
Fix nlpo configs

#50 rajcscw closed 1 year ago
0
In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?

#49 guotong1988 opened 1 year ago
0
Resuming from checkpoint is potentially problematic for IMDB since the splits are resampled

#48 zhixuan-lin closed 1 year ago
1
`train` and `val` splits are not disjoint for IMDB

#47 zhixuan-lin closed 1 year ago
3
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

#46 guotong1988 opened 1 year ago
0
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

#45 guotong1988 closed 1 year ago
1
Bloom Supporting

#44 c-box opened 1 year ago
3
Error when trying to load a checkpoint from Transformers after RL training

#43 avacaondata closed 1 year ago
5
Metric version incompatible

#42 c-box opened 1 year ago
0
_pickle.UnpicklingError: pickle data was truncated

#41 Oxtay opened 1 year ago
0
Pip install fix

#40 kolbytn opened 1 year ago
0
Make sure transformer return past_key_values

#39 DvHuang opened 1 year ago
0
Value is not broadcastable with batch_shape+event_shape

#38 vcvcvnvcvcvn opened 1 year ago
0
Persistent Variance in IMDB

#37 mnoukhov opened 1 year ago
1
fix: OnPolicyAlgorithm doesnot have the parameter: create_eval_env

#36 hscspring opened 1 year ago
1
Gradient Accumulation feature proposal

#35 eublefar closed 1 year ago
0
Problem with BLEURT reward function

#34 eublefar opened 1 year ago
0
Is it possible to release the code based on Jax

#33 sglucas opened 1 year ago
0
Evaluating a specific checkpoint

#32 lovodkin93 opened 1 year ago
5
UnderStand Mask model to _get_action_masks in LogitsProcessor

#31 xesdiny closed 1 year ago
0
'BartForConditionalGeneration' has no attribute 'encoder'

#30 keeganstoner closed 1 year ago
4
Mix-Precision training

#29 lovodkin93 opened 1 year ago
2
Reproducing IMDB results

#28 mnoukhov opened 1 year ago
4
Is the construction of _value_model necessary?

#27 xesdiny closed 1 year ago
2
passing extra variable to the forward function

#26 lovodkin93 opened 1 year ago
1
Problems with models that don't have the parallelize() function

#25 lovodkin93 opened 1 year ago
1
Changed from logging with the root logger

#24 JulesGM opened 1 year ago
0
Off-policy RL algorithms support

#23 Div99 opened 1 year ago
5
Just a warning that the package doesn't work with Transformers 4.25.1

#22 JulesGM opened 1 year ago
1