PygmalionAI / data-toolbox

Our data munging code.
GNU Affero General Public License v3.0
34 stars 9 forks source link

Generate synthetic negative data #11

Closed 0x000011b closed 1 year ago

0x000011b commented 1 year ago

To test the implementation of the CRINGE loss in our training code, we need some examples of what the model should not generate.

I have some filters in the data-toolbox that drop training examples based on certain criteria (e.g.: messages are too similar to each other indicating looping, or messages are too short on average). If we add a flag to generate using only these dropped examples, we can build a training set of negative examples that we can use to test.

TearGosling commented 1 year ago

Very old issue, closing for now.