Open heleensev opened 2 years ago
Great great job @heleensev :) I've read the paper and your notes, and I'll put here my thoughts as well. Let's further discuss this tomorrow at the weekly meeting with the others.
Ideas:
Comments:
Specific questions on MHCFlurry training dataset (S3 in the supplemental material)
mass_spec
measurement kind, I see several different alleles (A, B, C, E, G, ...).
This issue is about mass spectrometry data (MS), we have not made a decision about whether to include the data or not. We want our affinity predictor to have no bias towards peptides that are processed in the cell (antigen processing or AP). This is because we want our predictor (DeepRank) to solely focus on modelling the interaction based on physio-chemical features. MS data has this bias because the pMHC (HLA bound to peptide) was eluted from human cells. Furthermore, the MS technique has the limitation that highly hydrophobic peptides cannot be detected, and cysteine containing peptides are also harder to measure)
Here are some sources (and conclusions from these sources) that will help us make a decision about the inclusion of MS data in our experiments: