Closed Joey-Xue closed 1 week ago
Hi @Joey-Xue -- thanks for your question! We do not rescale the experimental scores on purpose, as different groups may have different views on the best approach to normalize them depending on their objectives. So the scale of the DMS score is identical to the scale of the measured phenotype in the original paper they were obtained from (see the "raw_DMS_phenotype_name" column in the reference file). What we do however is to correct the sign of scores as needed, such that a positive DMS_score always corresponds to "higher fitness". In prior work, when developing (semi-)supervised models on the DMS data (eg., in ProteinNPT), we found it helpful to standard normalize the scores before training. Best, Pascal
Hello Pascal, thanks for your quick and comprehensive reply! The reference file provides exactly the information I need! So in the benchmark on protengym website, I wonder how the zero shot regression was performed as the DMS score was in different scales? One more question, the download link for the full proteinGYM-substitution dataset is not accessible now: https://marks.hms.harvard.edu/proteingym/DMS_ProteinGym_substitutions.zip Since I want to download the newest version of the dataset. Would you mind check when would the downloadable dataset be available again? Thanks again for your kind support.
Hi Pascal, thanks for your kind reply! Was pretty helpful
Thanks for providing the benchmark. I was recently trying benchmark some models on the dataset and found the DMS scores highly diverse for regression. For most of the assays, the max-min DMS score range from around -5 to 5, but for some assays like Q6WV13_9MAXI, D7PM05_CLYGR, the number ranges from 589 to 40000, which is clearly not in the same scale. And for B2L11_HUMAN the min_max activity is 2640756.73 ~ 100215199.65. It seems that the score is not a normalized metric. I checked the github repository and ProteinGym paper but didn't find how the score was defined. Could you provide more clear definition of the score? Thanks in advance for any kind help