Generate synthetic negative data

To test the implementation of the CRINGE loss in our training code, we need some examples of what the model should not generate.

I have some filters in the data-toolbox that drop training examples based on certain criteria (e.g.: messages are too similar to each other indicating looping, or messages are too short on average). If we add a flag to generate using only these dropped examples, we can build a training set of negative examples that we can use to test.

PygmalionAI / data-toolbox

Generate synthetic negative data #11