Confusion about your model published on https://huggingface.co/nreimers/MiniLM-L6-H384-uncased

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

14.95k stars 2.44k forks source link

Open abis330 opened 5 months ago

abis330 commented 5 months ago

Did you arrive at this model by performing "deep self-attention distillation" by using "microsoft/MiniLM-L12-H384-uncased" as a teacher assistant (which was derived as a student of UniLMv2 as per the paper MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers or by directly removing every second layer from the already achieved student model of "microsoft/MiniLM-L12-H384-uncased"?

It isn't exactly clear to me. Could you please confirm?

tomaarsen commented 5 months ago

cc @nreimers

I can't say for certain, but my suspicion is that this model is literally just the https://huggingface.co/microsoft/MiniLM-L12-H384-uncased model but with every second layer removed, i.e. no distillation.