issues
search
CarperAI
/
trlx
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
MIT License
4.51k
stars
471
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
OOM error with PEFT LoRA on Llama2-7B
#601
arpaiva
opened
2 months ago
1
Load the checkpoint fails
#600
AfraAmini
opened
2 months ago
0
cannot import name 'flatten_dataclass' from 'trlx.data.ilql_types'
#599
AfraAmini
opened
3 months ago
0
maybe bug in prepare & load's order
#598
daiwk
opened
3 months ago
1
Error when running Ray Tune to launch hyperparameter sweep
#597
Jing-L97
opened
4 months ago
1
Crash when using save_state with deepspeed: `model.state_dict` functions incompatible with new deepspeed.
#596
JohannesAck
opened
4 months ago
0
Data Loader Bug when running t5_summarization_daily_cnn.py
#595
yunanyan
opened
5 months ago
0
Why train dataloader is not prepared by Accelerator
#594
Jiaxin-Wen
opened
6 months ago
0
TRLX Environment customization
#593
heraldiclily
opened
7 months ago
0
Issue of tensors share memory
#591
heraldiclily
opened
7 months ago
2
[New Feature Request] Add KTO
#590
1485840691-eng
opened
10 months ago
0
RLHF text summarization diverges
#589
AlisonWen
opened
10 months ago
0
Integration of Self-Play Fine-Tuning (SPIN) Method for Enhancing Large Language Models
#588
SeungyounShin
opened
10 months ago
0
Runtime error when running examples (ilql_sentiments_t5.py)
#587
youxiho1
opened
10 months ago
2
Add citation info from the EMNLP paper
#586
StellaAthena
closed
10 months ago
0
MPT is not working
#585
ouhenio
opened
11 months ago
0
when i use trlx ppotrainer train a model llama 13b model, but saved huggingface mode ,but when it inference , it has some strange keys ,and the inference result did not show ,it also have no error , it seems the result disapper
#584
ldh127
opened
11 months ago
1
Faster & memory-efficient logprobs calculation
#583
li-plus
opened
11 months ago
1
Attention mask when calculating log ratio for PPO
#582
kmy17518
opened
1 year ago
0
Multi-GPU training errors with peft
#581
AliengirlLiv
opened
1 year ago
1
Issue since most recent transformers update
#580
siddharthverma314
opened
1 year ago
1
update(requirements.txt): to the latest `transformers` & `deepspeed`
#579
maxreciprocate
opened
1 year ago
1
fix(modeling_base): partial loading of a sharded checkpoint
#578
maxreciprocate
closed
1 year ago
0
resume_from_checkpoint doesn't work
#577
andrewsiah
closed
1 year ago
1
fix model state_dict retrieving in zero3
#576
Jingru
closed
1 year ago
0
support parallel reward function
#575
Jingru
opened
1 year ago
16
Support parallel reward_fn in PPO training
#574
Jingru
closed
1 year ago
0
support customized run_name in tracker
#573
Jingru
closed
1 year ago
1
Support customized run name
#572
Jingru
closed
1 year ago
0
multigpu support for summarization ppo example
#571
sayan1101
opened
1 year ago
3
fix(examples/t5_summarize_cnn): move labels into `reward_fn` kwargs
#570
maxreciprocate
closed
1 year ago
0
TypeError: reward_fn() got an unexpected keyword argument 'tokenizer'
#569
sayan1101
closed
1 year ago
1
support extra model and tokenizer configs during loading by from_pretrained in accelerate trainer
#568
Jingru
closed
1 year ago
1
Problem with LLama training with LoRA
#567
freQuensy23-coder
opened
1 year ago
3
fix(modeling_base): re-order `model.forward_kwargs` initialization
#566
maxreciprocate
closed
1 year ago
1
Question about saving peft checkpoint
#565
nhanph
opened
1 year ago
2
`position_ids` error in accelerate PPO trainer
#564
pbarragan
closed
1 year ago
3
[Fix] Add default config LLaMa 2 converter Nemo
#563
PhungVanDuy
closed
1 year ago
0
Add default config LLaMa 2 converter Nemo
#562
PhungVanDuy
closed
1 year ago
0
How to generate reward-labeled dataset
#561
mikkelmedm
opened
1 year ago
0
feats: Add text enviroment examples
#560
PhungVanDuy
opened
1 year ago
0
How to train LLaMA2 on the summarize_rlhf example?
#559
missflash
opened
1 year ago
0
docs: update documentation
#557
maxreciprocate
closed
1 year ago
1
feat: Add support for DPO
#556
sandeepchittilla
opened
1 year ago
12
Inference pipeline
#555
Dahoas
opened
1 year ago
1
feat: add rejection finetuning trainer
#554
maxreciprocate
closed
1 year ago
1
Increasing max new tokens for generation arguments lead to errors
#553
wise-east
opened
1 year ago
3
fix(examples/hh): old gpt-j checkpoint loading
#552
maxreciprocate
closed
1 year ago
0
revert(ppo_trainer): keep `save_pretrained` only over the base model
#551
maxreciprocate
closed
1 year ago
0
Add trlX cite
#550
Dahoas
closed
1 year ago
0
Next