Collaboration Nuno Bandeira

ReneRanzinger commented 4 weeks ago

On Wed, Aug 30, 2017 at 11:10 PM, Nuno Bandeira [bandeira@ucsd.edu](mailto:bandeira@ucsd.edu) wrote:

Hi Raja,

It was great to touch base today - I'm attaching the Nature Biotech paper I mentioned today for our GNPS metabolomics/natural products resource and you can also check out our MassIVE ProteomeXchange proteomics repository (nearly 8,000 datasets, >320 million peptide-spectrum matches) if you'd like a sense for the sorts of things we provide for this (e.g., CPTAC colorectal cancer reanalyses at MSV000079852). We've also done some preliminary work on glycopeptide spectral networks that might be relevant for some of the connections we were discussing today.

Best, Nuno

ReneRanzinger commented 4 weeks ago

Hi Nuno, Thank you for sending us the links to the papers. I would like to start with a initial focused data set that you have. That does not mean we are not interested in the other works and we will follow up with them. Would you be able to point me to a table on your site or send us a csv file with the following columns? UniProt AC | Cancer term | wild type amino acid | mutated amino acid | PTM site gained | patient ID or UniProt AC | Cancer term | amino acid | PTM observed | patient ID

We may need several email to better understand what you have and what we can immediately start looking into. Hayley or Jeet will followup with you on this and schedule a call as we move forward. Best regards, Raja

ReneRanzinger commented 4 weeks ago

On Thu, Aug 31, 2017 at 12:54 PM, Nuno Bandeira

Hi Raja,

Sure thing - the data is already available in a format that allows you to extract the information you're mentioning. Focusing specifically on blind modification discovery in CPTAC's colorectal cancer, you can find the raw search results (1% spectrum-level FDR) as follows:

All patient results in mzTab format
All results for patient TCGA-AA-3518-01A-11 (can download results by clicking on Download at the top of the results view, bulk FTP downloads at ftp://massive.ucsd.edu/RMSV000000004/2016-09-05_ccms_3dd00bc6/ccms_result/CPTAC_MODA) -- UniProt Acc in the "Protein" column -- Amino acid coordinates can be determined from the peptide sequence in the "Peptide" column -- Modification site and mass are in the "Modifications" column

Please do keep in mind that these results are only significant at the single-patient spectrum-level FDR; this illustrates how the resources could communicate but the actual set of results that would be transferred should be filtered differently.

Best,

Nuno

ReneRanzinger commented 4 weeks ago

@rajamazumder @jeet-vora

ReneRanzinger commented 3 weeks ago

@edwardsnj will setup a meeting

ReneRanzinger commented 3 weeks ago

Hi Nuno,

I've heard on the grapevine about conversations between Raja and Mike and you about ways that GlyGen and MassIVE might promote common interests.

My understanding of our (GlyGen's) primary interest is to be able to find MS datasets that can be mined for glycobiology knowledge.

The lowest hanging fruit, clearly, would be PNGaseF treated proteins analyzed by MS/MS, which provide information on N-glycosylated asparagine sites.

In some cases, I think small O-glycans (O-GlcNAc, O-GalNAc-core, etc.) might also be findable as peptide modifications. (I think Hui Zhang's group demonstrated this at one point using an open-search approach). It might also be possible to look for characteristic oxonium ion m/z values in MS/MS spectra as an indication that an intact N-glycosylated glycopeptide was selected as the precursor.

I have attempted to find PNGaseF treated MS/MS datasets in MassIVE (and PRIDE and ProteomeXchange) and had a very difficult time. So I think this would provide a concrete starting point for a discussion of collaborative work.

What do you think? Can we set up a Zoom call to discuss in a couple of weeks?

n

ReneRanzinger commented 3 weeks ago

Hi Nathan,

Great to hear from you and the rest of the group on this thread following up on this topic. We are always looking for more ways to make the deposited data more valuable for the community and this is an obvious direction of reanalysis that can also help pose interesting research questions. There are some things that we might be able to do using existing infrastructure (like look for spectra/files/datasets with high frequency of peaks at specific MS/MS mz values) but we could also set up workflows to run collaborative research code on all datasets without requiring you to download 1 TB of data.

I can move things around on Tuesdays or Wednesdays for the next two weeks - would some times on these days work for you for us to set up a zoom meeting?

Best,

Nuno

glygener / glygen-issues

Collaboration Nuno Bandeira #1300