Skyline DIA: Bad Peaks - Githubissues

grace-ac commented 6 years ago

Notebook post detailing what I did in Walnut and changes I made to settings: 2018-07-27-Skyline-DIA.md

Nick from Skyline recommended trying the Advanced Peak Picking Model for Step 5b: Spot-checking Peptides. DIA instructions are not at all easy to follow - need help.

Latest Skyline document: http://owl.fish.washington.edu/scaphapoda/grace/2015-oysterseed-project/20180727-2015-Cgseed.sky.zip

Screenshots of some "peaks" in the document: 20180813-bad-peaks-01 20180813-bad-peaks-2

grace-ac commented 6 years ago

updated zip: 20180831-Cgseed

emmats commented 6 years ago

Don't get too excited, but I think I'll have time to work on this next week. I'm just going to run through making the Skyline document and see if I get something different from/better than (?) what you got. I'm assuming the updated zip file you posted above is from the Walnut workflow?

grace-ac commented 6 years ago

Okay!!

Both zip files are from the Walnut workflow, but the "updated" one is just the most recent attempt. I tried it out again last Friday and went through it step by step.

Here's a notebook post related to the 08-31 attempt: here

emmats commented 6 years ago

Which fasta file did you use in Walnut and in Skyline?

grace-ac commented 6 years ago

I've been using this fasta for everything: http://owl.fish.washington.edu/scaphapoda/grace/2015-oysterseed-project/2015-DIA/Cg_Giga_cont_prtc_AA.fasta

grace-ac commented 6 years ago

The above fasta came from Step 2c in the protocol (Steven did this for me).

$ cd Desktop/

srlab@swan MINGW64 ~/Desktop
$ cd grace/

srlab@swan MINGW64 ~/Desktop/grace
$ head Cg_Giga_cont_prtc_AA_digested_Mass400to6000.txt
Protein_Name    Sequence        Unique_ID       Monoisotopic_Mass       Predicte
d_NET   Tryptic_Name
CHOYP_043R.5.5|m.64252  SPSEDPDAPIENILQTNSVYKPK 1       2541.2598016    0.3655t2
.1
CHOYP_043R.5.5|m.64252  SPSEDPDAPIENILQTNSVYKPKK        2       2669.35475980.34
14      t2.2
CHOYP_043R.5.5|m.64252  SPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVK     3       3942.973
762     0.3449  t2.3
CHOYP_043R.5.5|m.64252  SPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILR  45180.67
6764    0.5144  t2.4
CHOYP_043R.5.5|m.64252  KEPTYDENVVVK    5       1419.7245246    0.2186  t3.2
CHOYP_043R.5.5|m.64252  KEPTYDENVVVKIISQDTPTILR 6       2657.4275266    0.4593t3
.3
CHOYP_043R.5.5|m.64252  KEPTYDENVVVKIISQDTPTILRVSFTVNR  7       3460.85649280.56
58      t3.4
CHOYP_043R.5.5|m.64252  EPTYDENVVVK     8       1291.6295664    0.2301  t4.1
CHOYP_043R.5.5|m.64252  EPTYDENVVVKIISQDTPTILR  9       2529.3325684    0.4402t4
.2

srlab@swan MINGW64 ~/Desktop/grace
$ awk '{print $1,$2}' Cg_Giga_cont_prtc_AA_digested_Mass400to6000.txt | head
Protein_Name Sequence
CHOYP_043R.5.5|m.64252 SPSEDPDAPIENILQTNSVYKPK
CHOYP_043R.5.5|m.64252 SPSEDPDAPIENILQTNSVYKPKK
CHOYP_043R.5.5|m.64252 SPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVK
CHOYP_043R.5.5|m.64252 SPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILR
CHOYP_043R.5.5|m.64252 KEPTYDENVVVK
CHOYP_043R.5.5|m.64252 KEPTYDENVVVKIISQDTPTILR
CHOYP_043R.5.5|m.64252 KEPTYDENVVVKIISQDTPTILRVSFTVNR
CHOYP_043R.5.5|m.64252 EPTYDENVVVK
CHOYP_043R.5.5|m.64252 EPTYDENVVVKIISQDTPTILR

srlab@swan MINGW64 ~/Desktop/grace
$ awk '{print $1,$2}' Cg_Giga_cont_prtc_AA_digested_Mass400to6000.txt \
> > Cg_Giga_cont_prtc_AA_M400-6000-2c.txt

And then I think we just converted it to a .fasta

emmats commented 6 years ago

For the first time you tried this, that could have been an issue since I had originally used a different fasta for pecan. But since we are re-making the blib file now, it shouldn't matter. I've already found a difference in a setting between your Skyline document and the suggested settings in the MS1 extraction tutorial on the Skyline website. I'm going to go through the whole thing and see if I can find anything else.

grace-ac commented 6 years ago

Thank you Emma!

emmats commented 6 years ago

I haven't done a complete comparison of your Skyline document and the tutorial, instead I made a new one! Maybe not the most efficient thing to do, but it is done. I followed 2 tutorials: Data Independent Acquisition and iRT Retention Time Prediction. The latter was to use the PRTC peptides to do a better job of coordination peak IDs across replicates. I think it did what it was supposed to do, but since this is the first time I have used iRT I'm going to run it by Nick to make sure. You can see what I did in my Evernote entry. The new Skyline document is here. I'll let you know what Nick says about my skills implementing iRT. If this is enough of an improvement to move forward with the entire dataset, great. If it's not then a couple ways to move forward would be to choose protein-based or pathway-based analysis. This would serve to narrow down your target list to targets of specific interest and then manually curate the dataset to make sure your peaks are well chosen. This would then allow you to do a more accurate comparison of protein abundance than pure spectral counting.

grace-ac commented 6 years ago

Thank you so much! I'll look at this today and Thursday

sr320 commented 5 years ago

feels like @emmats has tackled this.

grace-ac commented 5 years ago

@emmats did Nick get back to you on your iRT implementation?

emmats commented 5 years ago

He said I did it correctly.

RobertsLab / resources

Skyline DIA: Bad Peaks #341