CarperAI trlx issues - Githubissues

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

MIT License

4.51k stars 471 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

OOM error with PEFT LoRA on Llama2-7B

#601 arpaiva opened 2 months ago
1
Load the checkpoint fails

#600 AfraAmini opened 2 months ago
0
cannot import name 'flatten_dataclass' from 'trlx.data.ilql_types'

#599 AfraAmini opened 3 months ago
0
maybe bug in prepare & load's order

#598 daiwk opened 3 months ago
1
Error when running Ray Tune to launch hyperparameter sweep

#597 Jing-L97 opened 4 months ago
1
Crash when using save_state with deepspeed: `model.state_dict` functions incompatible with new deepspeed.

#596 JohannesAck opened 4 months ago
0
Data Loader Bug when running t5_summarization_daily_cnn.py

#595 yunanyan opened 5 months ago
0
Why train dataloader is not prepared by Accelerator

#594 Jiaxin-Wen opened 6 months ago
0
TRLX Environment customization

#593 heraldiclily opened 7 months ago
0
Issue of tensors share memory

#591 heraldiclily opened 7 months ago
2
[New Feature Request] Add KTO

#590 1485840691-eng opened 10 months ago
0
RLHF text summarization diverges

#589 AlisonWen opened 10 months ago
0
Integration of Self-Play Fine-Tuning (SPIN) Method for Enhancing Large Language Models

#588 SeungyounShin opened 10 months ago
0
Runtime error when running examples (ilql_sentiments_t5.py)

#587 youxiho1 opened 10 months ago
2
Add citation info from the EMNLP paper

#586 StellaAthena closed 10 months ago
0
MPT is not working

#585 ouhenio opened 11 months ago
0
when i use trlx ppotrainer train a model llama 13b model, but saved huggingface mode ,but when it inference , it has some strange keys ,and the inference result did not show ,it also have no error , it seems the result disapper

#584 ldh127 opened 11 months ago
1
Faster & memory-efficient logprobs calculation

#583 li-plus opened 11 months ago
1
Attention mask when calculating log ratio for PPO

#582 kmy17518 opened 1 year ago
0
Multi-GPU training errors with peft

#581 AliengirlLiv opened 1 year ago
1
Issue since most recent transformers update

#580 siddharthverma314 opened 1 year ago
1
update(requirements.txt): to the latest `transformers` & `deepspeed`

#579 maxreciprocate opened 1 year ago
1
fix(modeling_base): partial loading of a sharded checkpoint

#578 maxreciprocate closed 1 year ago
0
resume_from_checkpoint doesn't work

#577 andrewsiah closed 1 year ago
1
fix model state_dict retrieving in zero3

#576 Jingru closed 1 year ago
0
support parallel reward function

#575 Jingru opened 1 year ago
16
Support parallel reward_fn in PPO training

#574 Jingru closed 1 year ago
0
support customized run_name in tracker

#573 Jingru closed 1 year ago
1
Support customized run name

#572 Jingru closed 1 year ago
0
multigpu support for summarization ppo example

#571 sayan1101 opened 1 year ago
3
fix(examples/t5_summarize_cnn): move labels into `reward_fn` kwargs

#570 maxreciprocate closed 1 year ago
0
TypeError: reward_fn() got an unexpected keyword argument 'tokenizer'

#569 sayan1101 closed 1 year ago
1
support extra model and tokenizer configs during loading by from_pretrained in accelerate trainer

#568 Jingru closed 1 year ago
1
Problem with LLama training with LoRA

#567 freQuensy23-coder opened 1 year ago
3
fix(modeling_base): re-order `model.forward_kwargs` initialization

#566 maxreciprocate closed 1 year ago
1
Question about saving peft checkpoint

#565 nhanph opened 1 year ago
2
`position_ids` error in accelerate PPO trainer

#564 pbarragan closed 1 year ago
3
[Fix] Add default config LLaMa 2 converter Nemo

#563 PhungVanDuy closed 1 year ago
0
Add default config LLaMa 2 converter Nemo

#562 PhungVanDuy closed 1 year ago
0
How to generate reward-labeled dataset

#561 mikkelmedm opened 1 year ago
0
feats: Add text enviroment examples

#560 PhungVanDuy opened 1 year ago
0
How to train LLaMA2 on the summarize_rlhf example?

#559 missflash opened 1 year ago
0
docs: update documentation

#557 maxreciprocate closed 1 year ago
1
feat: Add support for DPO

#556 sandeepchittilla opened 1 year ago
12
Inference pipeline

#555 Dahoas opened 1 year ago
1
feat: add rejection finetuning trainer

#554 maxreciprocate closed 1 year ago
1
Increasing max new tokens for generation arguments lead to errors

#553 wise-east opened 1 year ago
3
fix(examples/hh): old gpt-j checkpoint loading

#552 maxreciprocate closed 1 year ago
0
revert(ppo_trainer): keep `save_pretrained` only over the base model

#551 maxreciprocate closed 1 year ago
0
Add trlX cite

#550 Dahoas closed 1 year ago
0