SHI-Labs / Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)
https://arxiv.org/abs/2104.05704
Apache License 2.0
495 stars 77 forks source link

Unable to Replicate Text Classification Results #77

Open SethPoulsen opened 1 year ago

SethPoulsen commented 1 year ago

Hi, I am trying to replicate your Text Classification results so that I can then use your models on my own data set, but I am unable to get any of the text models working at all.

The problem I am running into is that GloVe is outputting a tensor of floats but the embedding layer TextCCT starts with seems to be expecting a tensor of integers. Is there some configuration option I am missing?

This is a follow-up to #73, which I don't have permissions to re-open.

Also in that issue, @stevenwalton mentioned

The insights from our vision work may not be as useful for NLP tasks, where many of these problems don't exist (transformers are quite successful on small datasets without pre-training).

Could you point me to any specific models? I liked your models because they were transformers with low parameter count and showed good performance on small data sets without pre-training. Any other transformers I can find that perform well on a small data sets have huge parameter counts and must have been pre-trained on some huge data set beforehand, which I am trying to avoid if possible (though I am going to try both to compare anyway).

Thanks for your help!

stevenwalton commented 1 year ago

I'm not quite sure what's going on without looking too closely, but you can see here that we basically only call torch's embedding which expects longs. This just looks like a casting issue to me. Are you double embedding by accident? Or is your input data float instead of long?

The code is pretty straight forward and honestly any embedder should work. The call graph is just embedder -> text tokenizer -> MaskedTransformerClassifier. Modifications should be fairly trivial as all our stuff is in the latter two.

SethPoulsen commented 1 year ago

I read in the paper that you used GloVe, so I ran the data set through GloVe on my own because I didn't see that happening anywhere in the codebase. The output of that was floats, which doesn't match the longs that are being expected by your embedding layer, as you say.

stevenwalton commented 1 year ago

https://medium.com/mlearning-ai/load-pre-trained-glove-embeddings-in-torch-nn-embedding-layer-in-under-2-minutes-f5af8f57416a