Investigate Chain of Hindsight for fine-tuning

PygmalionAI / data-toolbox

Our data munging code.

GNU Affero General Public License v3.0

34 stars 9 forks source link

Investigate Chain of Hindsight for fine-tuning #12

Closed 0x000011b closed 1 year ago

0x000011b commented 1 year ago

Paper, code, summary in the form of a Twitter thread. Claims to beat supervised fine-tuning (what we're currently doing) and RLHF (what we're not doing due to data and compute constraints at the moment).

If we're to faithfully follow the paper, we'll need multiple generations for a given prompt. RankGen + the existing models can help us generate synthetic data for this.

AlpinDale commented 1 year ago

I don't think we'll need to do this as there are better alternatives now. SelFee is promising, for one.

TearGosling commented 1 year ago

An old issue - closing for now.