MStulos filtering - Githubissues

glormph commented 1 year ago

We want some filtering capabilities for peptide characteristics.

From Rui:

Peptides identified in DIA data
Filter out those peptides that contain Cysteine
Filter out those peptides that contain Methionine
Filter out internal Lysine/Arginine
Filter redundancy in charge state

I would also add now that ragged ends (aka consecutive cleavage sites, that is KK, RR, RK and KR) are to be avoided. For example in the protein ADFGHKKEFG, we would filter out ADFGHKK peptide, but also ADFGHK, because trypsin will sometimes cut after the first K and sometimes after the second, making even the properly tryptic peptide more quantitatively unreliable.

glormph commented 1 year ago

[x] Acquisition type must be filled in on dataset or no data storage (make a quick method in analysis?)
[ ] Two regex textareas (pos and neg filter) for multiple re, e.g. .*[M].*, [A-Z]+[KR][A-Z]+
[x] ADFGHK case is hard, we need protein context for that, so store proteins that are used for each peptide and their start site, visualize so users can decide?
[x] Store charge state of PSMs
[x] Make sure we can filter on acq.type, LF, isobaric

glormph commented 1 month ago

More specs from Henrik:

[ ] Store MS1 of PSMs so you can plot aggregate PSM MS1 per charge state
[ ] Store some kind of score to compare peptides - we have MSGF score, but maybe peptide level PEP?

Need to talk with H, Rui, Georgios to make sure we have stored the correct data.

glormph / kantele

MStulos filtering #36