Open Deseram opened 4 years ago
So far it sounds correct.
I also achieved not the best results with ALBERT. So yes, please try BERT (or RoBERTa), it gives the best sentence representations in my experience.
Also I can recommend to first fine-tune on NLI, and then to tune on STS data.
wow.. appreciate the very quick response. When you tied ALBERT, did you get similar results or better? I'm developing a proof of concept using ALBERT as its a lighter model compared to Bert as the name suggests and did perform better slightly better than Bert. Do you recall the results you achieved with ALBERT?
Hi @Deseram You can search the issues here, I think there are some concrete numbers posted.
I don't recall the exact numbers, but when trained on NLI, I got worse results than BERT. I think the performance drop was quite large (~10 points), but I might remember wrongly.
If you need a light model I can recommend DistillBERT, which performs nearly on par with BERT for sentence embeddings, but has only half the layers.
Hi,
I'm been implementing you paper in Keras using Albert instead of Bert. I'm quite certain I've followed almost the exact setup presented in the paper in the code in this repo but I'm having difficulty reproducing good results for STS-Benchmark similar to bert.
Would be great if you can point out any issue with the configuration?
model: tensorflowhub keras layer v2
Setup Summary:
Input -> Albert -> sentence embedding -> dropout (tried without as well) rate=0.1 -> Pool Gobal average with mask (input attention mask) -> cosine similarity -> mse
metric -> pearson correlation optimizer -> Albert optimizer with 10% warmup and learning rate of 2e-5
I will attempt to use the huggingface model to see if theres any other issue. Thanks again for any advise you can give