Are there any updates regarding the ALBERT model? The README says this is the first attempt. Are there any details on the training of this specific model? I assume it is trained similar to the original ALBERT, but on the same data as KB/BERT, but I have not seen anything more concrete than what is here / on the HF hub and I guess the paper mentioned here really just covers the regular BERT model.
Are there any updates regarding the ALBERT model? The README says this is the first attempt. Are there any details on the training of this specific model? I assume it is trained similar to the original ALBERT, but on the same data as KB/BERT, but I have not seen anything more concrete than what is here / on the HF hub and I guess the paper mentioned here really just covers the regular BERT model.