Closed benjaminalbert closed 6 months ago
Hi Benji,
Thank you! To answer your questions:
Best, Pascal
Hi Pascal, thanks for the response. Regarding question 2, I was wondering whether UQ estimates were available at the mutant level along with the predicted scores at the mutant level, not for the protein redesign experiments, but for the 5-fold random CV scheme. For example, when you release mutant-level scores for question 1, will you also provide mutant-level predicted standard deviations from the hybrid UQ approach?
Hi Benji,
We will not be releasing UQ estimates within the detailed scoring files for question 1 as this is not something we have computed for all baselines & DMS assays in ProteinGym. But we should still have the data we used to create Figure 10 from the ProteinNPT paper, which includes UQ estimates for ProteinNPT for the 3 CV schemes, for the different uncertainty schemes and for ~100 assays. Is that something that would be helpful to you?
Best, Pascal
Yes, that data would be very helpful. I look forward to it, and once again, I appreciate your help.
Hi @pascalnotin, we were wondering if you could also please provide the data used to generate figure 2 (multiples mutants performance) and whether UQ values were calculated?
Hi @benjaminalbert -- we did not compute UQ values on multiples, but the data used to generate figure 2 can be found here: https://docs.google.com/spreadsheets/d/1jygsC0CDlxYUY2-YveJ-58yEhMKcwn4JY_D8Toc-5IA/edit?usp=sharing
Thank you very much, Pascal! Lastly, if you have MSEs and metrics calculated per fold (so that others can compare with statistical tests), we would greatly appreciate it.
Hi @benjaminalbert -- I just added the per fold metrics to the same google sheet.
Great, thank you very much!
Hi @benjaminalbert -- quick note on the latest results I shared: by default we standard normalize target values for modeling, which has an impact on the actual MSE performance values we report (it uniformly impacts all models/baselines we compare against though). Spearman performance being scale-independent is not impacted.
Hi @pascalnotin, thanks for the note. I saw in the ProteinNPT repo that the targets are standardized by the mean and std of the 3 training fold targets for each iteration of 5-fold CV.
Thank you very much for providing this amazing resource!
I would appreciate your help:
By the way, the link provided in the README for downloading all the baseline scores on the DMS substitutions is dead, though I'm not sure if this zip would contain the data I'm looking for.
Thank you in advance, Benji