Extract ascores for all potential PTM sites

RalfG commented 2 years ago

Thanks for this modern implementation of Ascore!

I would like to get scores for all potential PTM sites, but only the best score seems to be reported, even though the alternative sites are listed in the output. For instance:

Scan	LocalizedSequence	PepScore	Ascores	AltSites
3767	ALLSLRS[80]HK	23.64276885986328	12.159486	4

Additionally, when more than one alternate sites are present, the indices seem to repeat the first alternate site, instead of listing both alternate sites correctly:

Scan	LocalizedSequence	PepScore	Ascores	AltSites
4190	ALLSLHS[80]SK	35.06369400024414	7.7721076	4,4

Am I missing an option to report all scores, or could a simple change to the (Python) code allow me to parse all scores?

AnthonyOfSeattle commented 2 years ago

Hey! No problem.

I didn't originally set up the code to report the localization scores for all potential sites since the number of usable site determining peaks tends to drop off pretty quick in my experience. As a lab, we usually only focus on the best localization and then use that to filter for the most confident anyways. Making a user friendly option to get Ascores for all alternatives is likely a bit out of scope for now, but I can check the code real quick to see if getting the PepScore for all localizations is available. If you had that, you could walk down the list in Python and easily get the Ascores for any localization you want using the scoring methods available in the Python interface. I will report back in a day or so for you.

On the AltSites thing. This bug was pointed out to me by a lab member last week and you got me right before fixing it. Hope to have that patched up this week too.

Sorry for the delay, but thanks for the feedback!

RalfG commented 2 years ago

Thanks for the quick reply!

I definitely agree. From a user friendly perspective the current approach makes perfect sense. This is also the behavior I have seen in some other localization tools. However, if I could get Ascores for all sites through the Python interface, even if the code needs some small modifications, that would be great!

Looking forward to your reply.

AnthonyOfSeattle commented 2 years ago

Ok, looks like I don't expose the methods that would help you right now. I am happy to link the Python code to the relevant C++ methods, though. I will open a couple of PRs to address this.

First I will go ahead and fix the alternative site problem since that looked relatively easy.

AnthonyOfSeattle commented 2 years ago

Alright, this should do it. After running the initial score method (which is necessary to calculate all the initial information), all the internal PepScore containers can be accessed with the pep_scores attribute. You can iterate through this, decide which permutations of the modification of interest should be compared, and then pass the containers to the calculate_ambiguity function which compares the permutations based on site determining peaks (i.e. the ascore).

Hope this helps!

RalfG commented 2 years ago

Thank you so much for implementing this! I'll try it out tomorrow.

RalfG commented 2 years ago

Hi @AnthonyOfSeattle, I could successfully use the methods you have implemented! I do have three more questions:

Reading up on how the Ascore is calculated, I realized that it is derived from the difference between the scores for the two best sites for a given peptide, similar to how the delta score is calculated from the two best candidates PSMs for a spectrum. As a result, a meaningful Ascore can only be calculated for the best site (or actually, for a peptide in general). Is this correct? To assess individual sites on a peptide, could I just use their weighted PepScores instead?
How is the weighted PepScore for a site calculated from each of the 10 scores? Which weights are used?

I added a new column to the output, called AltPepScores. Validating this new column, I noticed that not all AltSites are always listed. Did I miss something here? See, for instance the output for these peptides:

Scan	LocalizedSequence	PepScore	Ascores	AltSites	AltPepScores
4139	ALLSLHT[80]NK	24.972949981689453	5.0845594	4	21.58084487915039
4160	ALLSLKS[80]SK	40.81769561767578	15.62454	4	33.91855239868164,30.178926467895508
4190	ALLSLHS[80]SK	35.06369400024414	7.7721076	4,8	25.058517456054688,25.058517456054688

AnthonyOfSeattle commented 2 years ago

Glad to hear it!

I will answer in turn:

As it stands, I think the PepScore is a great score describing the total evidence for a localization, especially since it is focused on the most intense peaks. The Ascore is derived from looking for site determining peaks (peaks that are different between two localizations) between the top permutation for a spectra and the next permutation with one difference. It is a bit more involved than just a difference. I think it can be meaningfully derived as long as at least one more possible localization is present. Take the following example, where the last localization gets an Ascore1 of 0 because there are no remaining worse localizations with that site unmodified:
- ASCSDS[80]ES[80]K, PepScore = 10, Ascore1 = 5
- ASCS[80]DSES[80]K, PepScore = 5, Ascore1 = 2
- AS[80]CSDSES[80]K, PepScore = 1, Ascore1 = 0
Here are the weights: https://github.com/AnthonyOfSeattle/pyAscore/blob/9467276f22d230369b24fd56cd69eccb9e82d51c/pyascore/ptm_scoring/cpp/Ascore.cpp#L16 I have never been entirely sure how they were determined, but they have been with the code since the Gygi lab's original version. I could ask Judit to see if she remembers how they came about.
I apologize that AltSite is not entirely clear, and I hope to document it better in the future. This is a good example to talk about what they are though. The AltSites come from the next best localization (the one you calculate the Ascore against), and any time there are multiple alternative localizations with the same PepScore, they are also listed as alternatives. Thus, site 8 is listed for scan 4190 above but not for 4160. 4160 is also another time you can meaningfully derive another Ascore for site 4, but site 8 would automatically have an Ascore of 0.

Villen-Lab / pyAscore

Extract ascores for all potential PTM sites #4