issues
search
IBM
/
SALMON
Self-Alignment with Principle-Following Reward Models
https://arxiv.org/abs/2310.05910
GNU General Public License v3.0
148
stars
14
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
In which Training step do you use HH-RLHF and SHP datasets?
#5
richhh520
opened
4 months ago
1
A question about the paper
#4
richhh520
opened
4 months ago
1
fix RewardModel forward bug
#3
UbeCc
opened
8 months ago
0
Dataset: upload preference dataset
#2
Dada-Cloudzxy
opened
10 months ago
0
Fix typo in README.md
#1
eltociear
closed
1 year ago
0