Integration and visualization of mass-spec proteomics data

jjgao commented 8 years ago

Background: The CPTAC consotium has generated mass-spec proteomics data for TCGA samples. cBioPortal supports integrative analysis and visualization of multi-platform data sets including mutations, copy number alterations, gene expression, and methylation. We also support reverse phase protein array (RPPA) data, but currently don't support proteomics data.

Goal: Provide support of mass-spec proteomics data.

Integrate mass-spec data as another data type including general protein levels and PTMs (phosphoproteins, glycoproteins, etc..).
Support query of protein levels and PTM events.
Support correlation analysis of mutation, cna and protein data.
Visualize proteomics data in patient view.
(Optional) Visualize peptides.

Approach:

General protein levels can be modeled the same way as RPPA data.
PTM data need to be modeled separately.
Extend Oncoprint to support PTM data.
Extend Plots tab to support proteomics data.
Extend Enrichments tab to support proteomics data.
Visualize proteomics consequence of mutations and cnas in pateint view.

Need skills: Java, Javascript, Understanding of proteomics data.

Possible mentors: JJ Gao, Zack Heins

Tcheutchoua-Steve commented 8 years ago

Greetings, Please is there any particular document one can follow to get acquainted with proteomics data ? or any other related document.

zheins commented 8 years ago

Hi @Tcheutchoua-Steve, thanks for your interest. You can check this paper out: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002277

In shotgun proteomics, the sample is broken down by a protease, splitting the proteins up into pieces. They are run through the spectrometer, which measures mass to charge ratio, intensity, and time at which these pieces eluded. The MS will select ions of sufficient intensity to further fragment, breaking down the peptide into even smaller pieces, ideally allowing you to reconstruct the sequence of that peptide based on the mass to charges observed. Then, from the peptides identified it is possible to reconstruct what proteins were in the sample.

Peptide quantification can be done as well using the intensity and time dimensions of the MS1, and from the peptides the protein level quantification can be inferred.

There are tools that do most of these computations. The final output usually is list of proteins that are likely in the sample with a list of peptides that were identified, which have mass/charge, intensity, and time at which they detected in the run.

Tcheutchoua-Steve commented 8 years ago

OK. Thanks for explanation and the resources. Will look at it very soon.

Best Regards On Mar 3, 2016 7:19 PM, "Zachary Heins" notifications@github.com wrote:

Hi @Tcheutchoua-Steve https://github.com/Tcheutchoua-Steve, thanks for your interest. You can check this paper out: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002277

In shotgun proteomics, the sample is broken down by a protease, splitting the proteins up into pieces. They are run through the spectrometer, which measures mass to charge ratio, intensity, and time at which these pieces eluded. The MS will select ions of sufficient intensity to further fragment, breaking down the peptide into even smaller pieces, ideally allowing you to reconstruct the sequence of that peptide based on the mass to charges observed. Then, from the peptides identified it is possible to reconstruct what proteins were in the sample.

Peptide quantification can be done as well using the intensity and time dimensions of the MS1, and from the peptides the protein level quantification can be inferred.

There are tools that do most of these computations. The final output usually is list of proteins that are likely in the sample with a list of peptides that were identified, which have mass/charge, intensity, and time at which they detected in the run.

— Reply to this email directly or view it on GitHub https://github.com/cBioPortal/GSoC/issues/4#issuecomment-191893749.

nv23 commented 8 years ago

Hi! I am very interested in this project and would like to ask if it is possible to learn JavaScript while working on it? I do have a biology background in genetics and bioinformatics, but no experience with JS. Thank you in advance!

jjgao commented 8 years ago

@nv23 Yes, it is possible. In the first step of the project, more Java and database skill is needed to integrate the proteomics data.

nv23 commented 8 years ago

@jjgao Thank you for the response! I do not have experience with databases too, unfortunately. Would it also be possible to pick that up as the project goes along? Thanks again!

jjgao commented 8 years ago

@nv23 It won't be too hard to learn database. But there is only limited time in the summer. if you understand proteomics data really well, it should work.

mariusdotspinu commented 8 years ago

Hello ! I would like to contribute to this project , as I have a big interest in bioinformatics , however I have no historical background regarding this topic. I did however have a contact with genetic programming techniques , but nonetheless , I would have to document myself about proteomics using the document above and some other resources . Anyway my question relies on if can I be of help , in case of knowing Java and database handling (PL/SQL) ? Thank you !

vertata commented 8 years ago

Hello, @jjgao ! I am interested in this project. I'm 4th year student of the Faculty of Applied Mathematics, Ukraine. I have skills in Java programming, some experience with databases (like mySQL, PostgresSQL, H2) and primitive knowleges in JavaScript. But proteomics data is new for me. What should be the level of knowledge in it? Can I learn it in the process?

jjgao commented 8 years ago

@mariusdotspinu @SugarCat : we are looking for someone who understands mass spec proteomics data for this project.

singh-arpit commented 8 years ago

Hello @jjgao I am interested in this. I have experience in JAVA, JS and have a Masters degree in Bioinformatics. Currently working with Mass spec data for Thermo Fisher Scientific. I can take it up on weekends.

cBioPortal / GSoC

Integration and visualization of mass-spec proteomics data #4