agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.13k stars 153 forks source link

Using subnets of ProtTrans for protein understanding/classification tasks #89

Closed xinshao-wang closed 1 year ago

xinshao-wang commented 2 years ago

First of all, many thanks to the authors of ProtTrans for releasing these amazing protein language models and code.

Second, for those who may be interested, there is our recent work which trains subnets of ProtTrans (e.g., when GPU memory is not large enough) for robust protein understanding/classification tasks. https://github.com/XinshaoAmosWang/DeepCriticalLearning#run-experiments.

I am open and looking forward to feedbacks and discussions. Thanks in advance.

mheinzinger commented 2 years ago

Interesting, thanks for sharing! - I will try to read in more detail later (though I can not guarantee as I am busy wrapping up some other things before going on vacation). What I already saw: maybe rather use subcellular-localization prediction together with the "Hard new test set" that we describe in the LightAttention paper for making a point on (balanced) accuracy: https://academic.oup.com/bioinformaticsadvances/article/1/1/vbab035/6432029