Om storlek på ChatGPT-modellen

Från Fredrik Ahlgren:

Det som menas med GPT3.5 enl. OpenAI's def. är följande, det är alltså flera modeller som betecknas GPT3.5. Och ChatGPT är fine tuned på någon modell nedan:

Models referred to as "GPT 3.5" GPT-3.5 series is a series of models that was trained on a blend of text and code from before Q4 2021. The following models are in the GPT-3.5 series:

code-davinci-002 is a base model, so good for pure code-completion tasks text-davinci-002 is an InstructGPT model based on code-davinci-002 text-davinci-003 is an improvement on text-davinci-002

https://beta.openai.com/docs/model-index-for-researchers

Och på följande sida kan man då läsa lite mer om just ChatGPT:

"ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022."

https://openai.com/blog/chatgpt/

Så långt som man kan se finns inga direkta siffror på just ChatGPT, men det är ju en så kallad "sibling" av InstructGPT enligt dom själva. Och läser man på lite mer om just InstructGPT så framgår följande.

"The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation. Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters. At the same time, we show that we don’t have to compromise on GPT-3’s capabilities, as measured by our model’s performance on academic NLP evaluations."

https://openai.com/blog/instruction-following/

Vilket kanske kunde tolkas som att ChatGPT sannolikt är mycket mindre än GPT3, men trots det så är det ju inte något som är helt öppet och klart som jag har hittat.

Itangalo / AI-Education

Om storlek på ChatGPT-modellen #29