Closed 0x000011b closed 1 year ago
Paper, code, summary in the form of a Twitter thread. Claims to beat supervised fine-tuning (what we're currently doing) and RLHF (what we're not doing due to data and compute constraints at the moment).
If we're to faithfully follow the paper, we'll need multiple generations for a given prompt. RankGen + the existing models can help us generate synthetic data for this.
I don't think we'll need to do this as there are better alternatives now. SelFee is promising, for one.
An old issue - closing for now.
Paper, code, summary in the form of a Twitter thread. Claims to beat supervised fine-tuning (what we're currently doing) and RLHF (what we're not doing due to data and compute constraints at the moment).
If we're to faithfully follow the paper, we'll need multiple generations for a given prompt. RankGen + the existing models can help us generate synthetic data for this.