Open ledell opened 3 years ago
(poster of the original joke about the acronym here): Will it really require constant clarification and disambiguation, though?
I think in any context involving processing variable-length sequences of raw text tokens, like the sentence from the abstract, it's clear that we're not talking about linear models (just like in the context of your "it's probably the most widely used and referred to "ML" algorithm in existence", it was clear that by "ML" you meant "machine learning" and not "maximum likelihood", another overloaded acronym).
Honestly, I think it would help the author's case as well to not be known as GLM -- it'd be the Michael B. Jordan of models. The issue with GLM being an overloaded term is that both the existing GLM and this new model differ in terms of fame (knowledge?) by orders of magnitude; disambiguating would help people remember this model. ML and MLE are on the same "level' of fame, this and the generalized linear model are... not.
@lorenlugosch In my opinion, yes. I would say the same if it were called SVM, or the name of any other existing, very popular, machine learning algorithm. My personal opinion is that the namespace of machine learning algorithms should not contain duplicates, for the sake of clarity and communication.
Given how important pretraining appears to be here, it seems like including a P in the acronym would have the dual benefits of preventing this namespace clash and emphasizing the pretrained nature of the model.
In your recent paper, you introduce a new method, GLM (General Language Model), and refer to this algorithm by the name "GLM" in your paper.
I wanted to offer the comment that using a name like "GLM" will likely lead to a lot of confusion since "GLM" has long referred to "Generalized Linear Model" in both the statistics and machine learning communities. There's nearly a 50 year history of using this term and it's probably the most widely-used and referred-to machine learning algorithm in existence.
Some alternative ideas:
In your abstract: Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data.
This type of sentence will require constant clarification and disambiguation on your part and on the part of the community, so I hope you will consider a new name for your method that's not already in use. Thank you for the consideration.