Closed balvisio closed 2 years ago
Hi,
yeah, we should probably improve model naming in the future... I absolutely see where the confusion comes from, sorry for that. So in brief:
Original downstream Predictions
and I'll add it soon'ish
If sth remained unclear, feel free to ask!Final note: in our hands, ProtT5-XXL did not perform as well as ProtT5-XL despite their size differences (3B vs 11B). Most likely this is due to the fact that the larger model saw less samples during pre-training (one epoch took much longer for the large model). So we usually recommend to only use the ProtT5-XL-UniRef50 model (and very specifically, the encoder-side thereof). If ran in half-precision, we usually got fast, reliable predictions for all our downstream tasks, usually, on-par or above ESM-1b.
Thank you very much for the clarification and updating the stats @mheinzinger ! Regarding your comment, IIUC ProtT5-XL was trained for a greater number of epochs than ProtT5-XXL? Was the training time-limited? Also, I would be interested in the hardware (e.g. #gpus, model) used to train them. Is there any public information about that?
Thanks again!
Yes, ProtT5-XL saw more samples during training as it could process more samples/second due its smaller size (still: ProtT5-XL has 3B parameters, so not really small; ProtT5-XXL had 11B). We did simply use the compute that we had available (which was limited to a certain extent) which is why we could not afford to train the XXL-version for exactly the same number of steps as the XL version. For Hardware-details I would point you towards our manuscript: https://ieeexplore.ieee.org/document/9477085 We trained on TPU-Pod v3 (for more information, see Table 2 in the ProtTrans-paper). In SOM you'll also find more information on the exact training/hardware-setup.
Thank you for the detailed info. Very interesting work! Couldn't find the SOM in the link above, is it already published?
Yeah, good point. For some reason IEEE does an incredible job in hiding this. You need to click on "Media" and then you get a link to SOM: https://ieeexplore.ieee.org/ielx7/34/4359286/9477085/supp1-3095381.pdf?arnumber=9477085
😂 Thank you!
Hi all, In the "Models Availability" the following models are listed:
ProtT5-XL-UniRef50
,ProtT5-XL-BFD
,ProtT5-XXL-UniRef50
,ProtT5-XXL-BFD
, etc... In the "Original downstream Predictions" tables, the models are named:ProtT5-XL-UniRef50
,ProtT5-XL-BFD
,ProtTXL
,ProtTXL-BFD
, etc...I am wondering, do the
ProtTXL
,ProtTXL-BFD
in the "Original downstream Predictions" table correspond to theProtT5-XXL-UniRef50
,ProtT5-XXL-BFD
in the "Models Availability"? Or do they correspond toProtT5-XL-UniRef50
,ProtT5-XL-BFD
respectively?Thank you!