DeepRank / 3D-Vac

Personalized cancer vaccine design through 3D modelling boosted geometric learning.
Apache License 2.0
3 stars 0 forks source link

Different labels for same peptide-MHC complex #37

Open DanLep97 opened 2 years ago

DanLep97 commented 2 years ago

Some entries in the MHCflurry database have different labels for identical peptide-MHC complexes.

From the MHCflurry database, the EAAGIGILTV peptide has different measurements for the same allele:

HLA-A*02:01,EAAGIGILTV,2272.0,=,quantitative,affinity,Rosenberg - purified MHC/competitive/radioactivity
HLA-A*02:01,EAAGIGILTV,14560.0,=,quantitative,affinity,Ovaa - purified MHC/competitive/fluorescence
HLA-A*02:01,EAAGIGILTV,500.0,<,qualitative,affinity,Sewell - cellular MHC/direct/fluorescence
HLA-A*02:01,EAAGIGILTV,5000.0,<,qualitative,affinity,Sewell - cellular MHC/direct/fluorescence

There are a lot of cases like this one.

heleensev commented 2 years ago

ba_duplicates_pie

This is the distribution of duplicates labels. There are a total of 187,485 entries. "same" means if all duplicates would be classified as binders/non-binders with a 500 nm cutoff. From this you could conclude that 2% of the data is noisy. Though you could also say that this is an indication that 12% (2/18: different/same) of the data could have noise. In other words we are not able to say that all the non-duplicated entries are always correct because we have no duplicates to corroborate that.