issues
search
allenai
/
RL4LMs
A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.13k
stars
191
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Question about the classifier used for IntentAccuracyDailyDialog.
#71
zhangjf-nlp
opened
2 months ago
0
Migrate to current version of gymnasium, SB3, and other libraries.
#70
Kripner
opened
4 months ago
0
Upgrade to torch 2.0
#69
agastyaseth
opened
4 months ago
1
how to stop env parallel multi-process to debug env.step()?
#68
invoker-LL
opened
6 months ago
0
Trying to use rl4lm with more recent libraries
#67
JosefSlavicek
closed
6 months ago
0
Is PPO really better than SFT (in general)? under the condition of same amount of data
#66
allanj
opened
7 months ago
1
Do you have any plans to apply the recently published Reinforced Self-Training (ReST)?
#65
missflash
opened
10 months ago
0
Pip install error with gym and torch
#64
BaleChen
opened
11 months ago
3
NLPO Code Error and Query About gymnasium vs gym Usage
#63
jinyilun718
opened
12 months ago
0
Reproducing existing results on NarrativeQA
#62
yxk23
opened
12 months ago
0
Memory issue in metric evals?
#61
AnujMahajanOxf
opened
1 year ago
0
is multi-dimensional reward supported?
#60
zabir-nabil
opened
1 year ago
0
CPU Support Minor Bug
#59
tedmoskovitz
opened
1 year ago
0
Fix IndexError when loading checkpoints
#58
Runingtime
opened
1 year ago
0
model.generate.scores returning two scores
#57
debjitpaul
opened
1 year ago
0
'GPT2Model' object has no attribute 'first_device'
#56
Stephanehk
opened
1 year ago
0
Using GPT-2
#55
oroojlooy
opened
1 year ago
0
How can I inference data with the model after PPO training?
#54
RyanYip-Kat
opened
1 year ago
0
Bug while loading t5 base model
#53
Sahajtomar
opened
1 year ago
1
Error with Accelerate integration + NLPO
#52
avacaondata
opened
1 year ago
1
[Question] End-to-end example
#51
farrokhsiar
opened
1 year ago
0
Fix nlpo configs
#50
rajcscw
closed
1 year ago
0
In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?
#49
guotong1988
opened
1 year ago
0
Resuming from checkpoint is potentially problematic for IMDB since the splits are resampled
#48
zhixuan-lin
closed
1 year ago
1
`train` and `val` splits are not disjoint for IMDB
#47
zhixuan-lin
closed
1 year ago
3
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?
#46
guotong1988
opened
1 year ago
0
A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?
#45
guotong1988
closed
1 year ago
1
Bloom Supporting
#44
c-box
opened
1 year ago
3
Error when trying to load a checkpoint from Transformers after RL training
#43
avacaondata
closed
1 year ago
5
Metric version incompatible
#42
c-box
opened
1 year ago
0
_pickle.UnpicklingError: pickle data was truncated
#41
Oxtay
opened
1 year ago
0
Pip install fix
#40
kolbytn
opened
1 year ago
0
Make sure transformer return past_key_values
#39
DvHuang
opened
1 year ago
0
Value is not broadcastable with batch_shape+event_shape
#38
vcvcvnvcvcvn
opened
1 year ago
0
Persistent Variance in IMDB
#37
mnoukhov
opened
1 year ago
1
fix: OnPolicyAlgorithm doesnot have the parameter: create_eval_env
#36
hscspring
opened
1 year ago
1
Gradient Accumulation feature proposal
#35
eublefar
closed
1 year ago
0
Problem with BLEURT reward function
#34
eublefar
opened
1 year ago
0
Is it possible to release the code based on Jax
#33
sglucas
opened
1 year ago
0
Evaluating a specific checkpoint
#32
lovodkin93
opened
1 year ago
5
UnderStand Mask model to _get_action_masks in LogitsProcessor
#31
xesdiny
closed
1 year ago
0
'BartForConditionalGeneration' has no attribute 'encoder'
#30
keeganstoner
closed
1 year ago
4
Mix-Precision training
#29
lovodkin93
opened
1 year ago
2
Reproducing IMDB results
#28
mnoukhov
opened
1 year ago
4
Is the construction of _value_model necessary?
#27
xesdiny
closed
1 year ago
2
passing extra variable to the forward function
#26
lovodkin93
opened
1 year ago
1
Problems with models that don't have the parallelize() function
#25
lovodkin93
opened
1 year ago
1
Changed from logging with the root logger
#24
JulesGM
opened
1 year ago
0
Off-policy RL algorithms support
#23
Div99
opened
1 year ago
5
Just a warning that the package doesn't work with Transformers 4.25.1
#22
JulesGM
opened
1 year ago
1
Next