Closed 0x000011b closed 1 year ago
It seems that most abilities of a large pre-trained languge model, such as dialogue generation or external knowledge grounding comes from the pre-training process itself.
That means a good pre-training model is all you need. Maybe you can consider to build a decent pre-training model first.
Maybe you can consider to build a decent pre-training model first.
Unfortunately that requires way more resources than I have access to, so for the foreseeable future my plan is to just fine-tune existing pretrained LMs. You're right about this though:
It seems that most abilities of a large pre-trained languge model [...] comes from the pre-training process itself.
The idea here is to get some data that will help make use of those abilities at inference time by injecting external knowledge between special tokens or something of the sorts, so that features like World Info on Kobold can work more reliably for example.
Done, data from ParlAI's SearchDialogueGenerationTeacher
can now be used to generate training examples.
Theoretically, all the data used to train Meta's BlenderBot 3 is available in the ParlAI library (repository). In practice, a significant amount of "teachers" and "tasks" there are broken, so the configurations they've released which are supposed to replicate the BB3 training data doesn't actually work.
Still, a significant amount of teachers can be made to work with small changes, and others work out of the box. We already have plenty of open-ended conversational data, so I'd like to see if we can find some good data for doing external knowledge grounding (so we can use it for long-term memories/internet search/world info/etc.).