agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.1k stars 152 forks source link

U class in "membrane" vs. "water soluble" dataset #73

Closed ratthachat closed 2 years ago

ratthachat commented 2 years ago

Hi, I have a question regarding the "membrane" vs. "water soluble" dataset, do you think is there some reasoning not to disregard the U class ?

mheinzinger commented 2 years ago

Hi, if you only want to train on membrane vs water-soluble, I do not see a reason to keep the Unknow/U. We only kept it because we wanted to use the exact same dataset as the original authors of this dataset: https://services.healthtech.dtu.dk/service.php?DeepLoc-1.0 Also it made multi-task learning of subcellular localization and membrane/soluble easier. But no reason to incldue them if you are not interested in either of those things.