Extending the LLM article: "general"-ness vs. so-called "hallucinations"

New section: "Modern LLMs lack the ability to self-improve"

Text

While we're often told the current models are powerful, and they certainly have their advantages, I find their inability to self-improve very curious.

An article by the Epoch research group claims that our civilisation may run out of training data very soon.

The language, not the program code on some particular programming language, is the most abundant thing in the training data fed to an LLM. If an LLM behaves in a manner comparable to a human in language processing, it should be most eloquent in its understanding of the natural language. If that is the case, it wouldn't be a stretch to imagine it filling its next training dataset using its language capabilities, be it paraphrasing or summarising. Some multimodal networks can recognise visual cues and infer some conclusions. Some, like Stable Diffusion or Dall-E, can create beautiful images. There is a lot of data that is already learned. Why, then, is such data not being used directly? What stops LLMs from learning further and further as infants do? Some networks can recognise voice, so why not use radios working 24/7, TV shows and anything else that seems valuable?

If LLMs became so powerful they are approaching "general artificial intelligence", why would training data ever be a limitation? I hypothesised that there was a significant issue with the quality of the available data. The term "hallucination" is thrown all around, and we probably deal with language models that need people to distinguish between a "good" and a "bad" output, as their models have limitations. And even if some composition of ML models could, in theory, be used to fill each other's gaps, there seems to be a limit to the practicality of such an approach.

References to add

Running out of the training data

"Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning", Pablo Villalobos, Jaime Sevilla, Lennart Heim, Tamay Besiroglu, Marius Hobbhahn, Anson Ho. Business Insider refers to them as the [Epoch] research group, so I should check this before I write it.

Greybeard-Entertainment / pages