broadinstitute / ssGSEA2.0

Single sample Gene Set Enrichment analysis (ssGSEA) and PTM Enrichment Analysis (PTM-SEA)
Other
230 stars 79 forks source link

Output only for 11 signatures #24

Closed julia-aguade closed 1 year ago

julia-aguade commented 2 years ago

Hi,

I'm using PTM-SEA to determine pathway enrichment from phospho-proteomics data. When I run my data I only get output results (plots and gct files) for 11 signatures, while when running the example I get data for 96 signatures. Is there some kind of filtering that determines from which signatures you get outputs or is there something wrong with my analysis? I am using the default parameters in the gui.R file

thank you

karstenkrug commented 2 years ago

Hi,

Sorry for the slow response. There might be several reasons why your output only contains 11 signatures.

1) You can lower the number of phosphosites required to score a signature ( paramater min.overlap). For PTM-SEA we typically require a minimum of 5 sites to be detected in the data.

2) Not all sites in your data can be mapped to sites in PTMsigDB. UniProt-centric site identifiers (e.g. Q06609;Y315-p) often cause problem with mapping sites, since UniProt accession numbers might get updated and residue numbers might change as well. We recommend using the flanking sequences as site identifiers (e.g. ETRICKIYDSPCLPE-p).

3) Limited depth of phoshoproteomic data. The likelihood of being able to score a signature in PTMsigDB increases with the number of sites in your input data. If your dataset only comprises a few thousand phosphosites you are likely only sampling the most abundant sites, but missing a lot of lower abundant sites.

I hope that helps.

Best, K

julia-aguade commented 2 years ago

Thanks Karsten. I've been trying for some time and I could not find how to obtain my data in the right format (flanking sequences with +-7aa). I use ArtMS and have the flanking sequences that are not centered around the phosphorylated aminoacid, and without a specific number of on each side. For example:

AAALQALQAQAPT(ph)SPPPPPPPLKAEQEEEGLPLPLANIK or AAALQALQAQAPTSPPPPPPPLKAEQEEEGLPLPLANIK_

Which kind of analysis do you perform to obtain the site identifiers as flanking sequences that are compatible with PTM-SEA?

thank you for your help