Given the rapid advancements in LLMs and the significance of ensuring their outputs are reliable, free from toxicity, and bias, I believe it is necessary to periodically update the files, especially less common languages, since by default, these LLMs don't have that much of the training data in these languages; and as research papers have previously shown, the less the size of the training corpus, the more the model is prone to toxicity.
This way, we can work towards achieving safer outputs from these models
Given the rapid advancements in LLMs and the significance of ensuring their outputs are reliable, free from toxicity, and bias, I believe it is necessary to periodically update the files, especially less common languages, since by default, these LLMs don't have that much of the training data in these languages; and as research papers have previously shown, the less the size of the training corpus, the more the model is prone to toxicity.
This way, we can work towards achieving safer outputs from these models