cytoscape / network-ptm-integration

This repo tracks the work of data-driven addition of phosphorylation sites to proteins in networks (and pathways).
MIT License
0 stars 0 forks source link

PTM annotations #2

Closed AlexanderPico closed 6 months ago

AlexanderPico commented 1 year ago

Presuming we'll use Phosphosite plus, let's take a moment to catalog the download/API options they provide and see what would be most useful. @khanspers has probably already done this, but it would be good to capture this knowledge in a document.

For example, in this repo, we should collect the annotation files we think will be useful into a folder called annotations and add a README with a brief description of the files.

khanspers commented 1 year ago

I will upload a README based on the information that's available from PSP, with some comments added. Downloading files from PSP requires an account (and agreement with their terms), so they might not like it if we put the files on a public GitHub repo. At the top of the README I added a statement to conform with their terms (point 3). We would also need to add a column to the data. But it would be great to hear what others have to say before adding the files to this repo.

khanspers commented 1 year ago

The PSP annotation files are available under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0). It does say that we are free to "Share — copy and redistribute the material in any medium or format", so maybe its ok to add the files to this repo?

However, what about this ShareAlike statement in regard to incorporating sites on pathways:

"ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original."

-> Is this a problem in terms of our CC0 license for pathways?

AlexanderPico commented 7 months ago

The PSP annotation files are available under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0). It does say that we are free to "Share — copy and redistribute the material in any medium or format", so maybe its ok to add the files to this repo?

However, what about this ShareAlike statement in regard to incorporating sites on pathways:

"ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original."

-> Is this a problem in terms of our CC0 license for pathways?

Not a problem

AlexanderPico commented 7 months ago

Let's catalog the download/API options they provide and see what would be most useful and document it here. We don't need to actually download everything and host it here; that's not the plan.

khanspers commented 6 months ago

Files from PSP download:

  1. Kinase_Substrate_Dataset
  2. Phosphorylation_site_dataset
  3. Regulatory_sites

Kinase-substrate dataset Information from PSP: experimentally determined substrates, sequences, cognate kinases, and metadata curated from the literature

=> Relevant columns: all. Includes domain info, and two columns IN_VIVO_RXN and IN_VITRO_RXN to designate evidence type, (mostly) matching "Putative in vivo kinses" and "Kinase, in vitro" from Upstream tab. Example filter for "SUBSTRATE=Akt1", "KIN_ORGANISM=human" results in 41.

Phosphorylation_site_dataset Information from PSP: sequences surrounding the PTMs (+/- 7 AAs), gene symbols, accession IDs, site group IDs, molecular weights, number of associated records, and Cell Signaling Technology catalog numbers where available. Separate files for each modification type including phosphorylation, acetylation, ubiquitination, methylation, O-glycosylation, sumoylation, etc.

=> This is the same information as the Site Table (for example https://www.phosphosite.org/proteinAction.action?id=570&showAllSites=true) Relevant columns: GENE, PROTEIN, GENE_ID, ACC_ID, MOD-RSD, SITE_GRP_ID, ORGANISM, DOMAIN, SITE Example filter for "GENE=AKT1", "ORGANISM=human", "MOD-RSD includes -p" results in 35 entries.

Regulatory_sites Information from PSP: Information curated from the literature about modification sites shown to regulate molecular functions, biological processes, and molecular interactions including protein-protein interactions.

=> Subset of sites from Site Table (for example https://www.phosphosite.org/proteinAction.action?id=570&showAllSites=true) Relevant columns: GENE, PROTEIN, ACC_ID, GENE_ID, MOD-RSD, SITE_GRP_ID, ORGANISM, DOMAIN, SITE Example filter for "GENE=AKT1", "ORGANISM=human", "MOD-RSD includes -p" results in 17 entries.