Consider using an different acronym than "GLM"

ledell commented 3 years ago

In your recent paper, you introduce a new method, GLM (General Language Model), and refer to this algorithm by the name "GLM" in your paper.

I wanted to offer the comment that using a name like "GLM" will likely lead to a lot of confusion since "GLM" has long referred to "Generalized Linear Model" in both the statistics and machine learning communities. There's nearly a 50 year history of using this term and it's probably the most widely-used and referred-to machine learning algorithm in existence.

Some alternative ideas:

GenLM: "General Language Model" (same name, different acronym)
GPF: "General Pretraining Framework" (in your title)
GPTLM or GPLM: "General Pretraining/Pretrained Language Model"

In your abstract: Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data.

This type of sentence will require constant clarification and disambiguation on your part and on the part of the community, so I hope you will consider a new name for your method that's not already in use. Thank you for the consideration.

lorenlugosch commented 3 years ago

(poster of the original joke about the acronym here): Will it really require constant clarification and disambiguation, though?

I think in any context involving processing variable-length sequences of raw text tokens, like the sentence from the abstract, it's clear that we're not talking about linear models (just like in the context of your "it's probably the most widely used and referred to "ML" algorithm in existence", it was clear that by "ML" you meant "machine learning" and not "maximum likelihood", another overloaded acronym).

adamlauretig commented 3 years ago

Honestly, I think it would help the author's case as well to not be known as GLM -- it'd be the Michael B. Jordan of models. The issue with GLM being an overloaded term is that both the existing GLM and this new model differ in terms of fame (knowledge?) by orders of magnitude; disambiguating would help people remember this model. ML and MLE are on the same "level' of fame, this and the generalized linear model are... not.

ledell commented 3 years ago

@lorenlugosch In my opinion, yes. I would say the same if it were called SVM, or the name of any other existing, very popular, machine learning algorithm. My personal opinion is that the namespace of machine learning algorithms should not contain duplicates, for the sake of clarity and communication.

noahaskell commented 3 years ago

Given how important pretraining appears to be here, it seems like including a P in the acronym would have the dual benefits of preventing this namespace clash and emphasizing the pretrained nature of the model.

THUDM / GLM

Consider using an different acronym than "GLM" #1