compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
42 stars 15 forks source link

Adding modification to SearchGUI #195

Closed chrishuges closed 6 years ago

chrishuges commented 6 years ago

Hi,

Is it possible to add the modification below to SearchGUI in a future release? It is the result of using TMT and SILAC together in the same experiment (hyperplexing). Just the lysine +8 modification plus the TMT 6-plex mass.

Lysine8-TMT6 - 237.177131 (I imagine a better name could be thought of).

Thanks, Chris

chrishuges commented 6 years ago

In theory you could have the same thing with lysine +4 or +6, so those could be viable adds as well.

hbarsnes commented 6 years ago

Hi Chris,

Yes, should be straightforward. If you can provide me with the chemical formulas for the three new modifications I can add them to the next release? BTW, I assume the TMT reporter ions would stay the same right?

Best regards, Harald

chrishuges commented 6 years ago

Thanks Harald. Yes the TMT reporters remain the same.

  1. TMT 6-plex + Lysine 4 = C(8)13C(4)H(16)2H(4)N15NO(2), 233.188039 Da
  2. TMT 6-plex + Lysine 6 = C(2)13C(10)H(20)N15NO(2), 235.183061 Da
  3. TMT 6-plex + Lysine 8 = C(2)13C(10)H(20)15N(3)O(2) -N, 237.177131 Da

Let me know if you need more details.

Cheers, Chris

hbarsnes commented 6 years ago

Hi Chris,

Thanks for the chemical formulas.

The only additional question is why 6-plex and not 10-plex (or 11-plex)? The only thing that changes is the number of reporter ions to look for? I guess we can always add three x three new modifications, but starts to look a bit messy?

Actually, given that this is basically a combination of two modifications, does it not already work to simply select both at the same time when doing the search? Or does this result in issues for the search engines? I should know this, but I assume you have already tried this option?

Best regards, Harald

chrishuges commented 6 years ago

Yes 11-plex would be the best choice as this will cover all bases in terms of reporters. Good point.

In my experience, most search engines will not assign two variable modifications to a single amino acid.

hbarsnes commented 6 years ago

Hi Chris,

Yes 11-plex would be the best choice as this will cover all bases in terms of reporters. Good point.

Just not sure if it's a good idea so also look for the extra 11-plex reporters when only using 6-plex. So for clearity I think I'll end up adding them separatly after all. We do this for the normal TMT modifications already. Best to be consistent.

In my experience, most search engines will not assign two variable modifications to a single amino acid.

Yes, I'm pretty sure you are correct. And if a search engine still ends up doing this, I think we filter out such peptides when loading the data in PeptideShaker anyway, as many combinations will not be chemically possible I guess.

I'll add the new combined modifications tomorrow and hopefully also release new versions of SearchGUI and PeptideShaker. Just waiting for one other unrelated fix to be completed.

Best regards, Harald

chrishuges commented 6 years ago

Perfect, thanks Harald!

hbarsnes commented 6 years ago

Hi Chris,

I've now added the new modifications, but I will not be able to deploy new versions today as the server is down.

Instead I've uploaded some snapshots to Dropbox. Would be great if you could give these a go to see if the new modifications work as wanted?

You'll find them here: SearchGUI for Windows: https://www.dropbox.com/s/wrv6hrd5tnbxjla/SearchGUI-3.3.9-SNAPSHOT-windows.zip?dl=0 SearchGUI for Linux/Mac: https://www.dropbox.com/s/xqhsst74immfs0p/SearchGUI-3.3.9-SNAPSHOT-mac_and_linux.tar.gz?dl=0 PeptideShaker: https://www.dropbox.com/s/kn8inw6b8rwu96r/PeptideShaker-1.16.35-SNAPSHOT.zip?dl=0

I decided on the following naming scheme: TMT 6-plex of K+4, TMT 6-plex of K+6, etc. I hope this makes sense? If not, it's not too late to change the names.

Best regards, Harald

chrishuges commented 6 years ago

Awesome, sounds good. I will try these out today and report back.

Thanks!

chrishuges commented 6 years ago

Harald,

The search seemed to work ok, and PeptideShaker was going ok until it hit an error:

Thu Nov 15 18:06:56 PST 2018 Scoring Peptide PTMs. Please Wait...
<CompomicsError>PeptideShaker processing failed. See the PeptideShaker log for details.</CompomicsError>
Thu Nov 15 18:06:57 PST 2018 An error occurred while loading the identification files:
Thu Nov 15 18:06:57 PST 2018 Attempting to create duplicate peptide key: FYKCDMCCK_229.16293213472008-ATAA-3 from peptide FYKCDMCCK_229.16293213472008-ATAA-3_237.17713094832013-ATAA-3.
Thu Nov 15 18:06:59 PST 2018 PeptideShaker Processing Canceled.
<CompomicsError>PeptideShaker processing canceled. See the PeptideShaker log for details.</CompomicsError>
chrishuges commented 6 years ago

Just for completeness, these are the commands I used to process the data. It is a set of 24-fractions.

Config

java -cp /projects/ptx_analysis/chughes/software/SearchGUI-3.3.9/SearchGUI-3.3.9-SNAPSHOT.jar eu.isas.searchgui.cmd.IdentificationParametersCLI -out /projects/ptx_analysis/chughes/parameter-files/current/ch_nov2018_OT-MS1_HCD-OT-MS2_human-trypsin_StdMods-TMT10plex-SILACkr.par -db /projects/ptx_analysis/chughes/databases/current/uniprot_human-crap_oct2018_FWD_concatenated_target_decoy.fasta -prec_tol 20 -frag_tol 0.05 -fixed_mods "Carbamidomethylation of C, TMT 10-plex of peptide N-term" -variable_mods "Oxidation of M, TMT 10-plex of K, TMT 10-plex of K+8, Arginine 13C(6) 15N(4)" -db_pi /projects/ptx_analysis/chughes/databases/current/uniprot_human-crap_oct2018_FWD_concatenated_target_decoy.fasta -msgf_fragmentation 3 -msgf_instrument 1 -msgf_protocol 4

Search

for i in *.mgf; do java -Xmx100g -cp /projects/ptx_analysis/chughes/software/SearchGUI-3.3.9/SearchGUI-3.3.9-SNAPSHOT.jar eu.isas.searchgui.cmd.SearchCLI -spectrum_files /projects/ptx_analysis/chughes/projects-current/search/$i -output_folder /projects/ptx_analysis/chughes/projects-current/search/ -id_params /projects/ptx_analysis/chughes/parameter-files/current/ch_nov2018_OT-MS1_HCD-OT-MS2_human-trypsin_StdMods-TMT10plex-SILACkr.par -output_option 1 -xtandem 1 -msgf 1 -tide 1; done

Validation

java -Xmx400g -cp /projects/ptx_analysis/chughes/software/PeptideShaker-1.16.35/PeptideShaker-1.16.35-SNAPSHOT.jar eu.isas.peptideshaker.cmd.PeptideShakerCLI -experiment polysomes -sample all-fractions -replicate 1 -identification_files /projects/ptx_analysis/chughes/projects-current/search/ -spectrum_files /projects/ptx_analysis/chughes/projects-current/search/ -id_params /projects/ptx_analysis/chughes/parameter-files/current/ch_nov2018_OT-MS1_HCD-OT-MS2_human-trypsin_StdMods-TMT10plex-SILACkr.par -out /projects/ptx_analysis/chughes/projects-current/search/ch_HEK293-EV_polysomes-tmt-silac_all-combined.cpsx
chrishuges commented 6 years ago

Looking through the PeptideShaker GitHub issues, it looks like this is a known error that occurs due to stacking of modifications on the same amino acid. It also seems like this will be fixed in the next major release? I don't mind waiting, as I can just split the data into two searches for now (one with light TMT, the other with heavy TMT) and work with the results downstream.

hbarsnes commented 6 years ago

Hi Chris,

Yes, I'm afraid this is a known error. Not really related to the modifications though but rather an issue with the memory handling. And yes, this should be fixed in the next major release.

Perhaps you could try loading a smaller subset of the data in PeptideShaker just to see that the new modification is handled correctly?

Best regards, Harald

chrishuges commented 6 years ago

When you say memory handling, do you mean memory amount? Because I can easily give it more. The machine I am using has around 1.5TB of ram.

Yes I will do a smaller test run today and see what happens.

hbarsnes commented 6 years ago

When you say memory handling, do you mean memory amount?

Giving it more memory might get provide you a way around the problem.

The underlying issue is that we use a database to store the results which we figured out is not very fast. We therefore started adding various caches in order to avoid writing to the database. This worked fine until we started multithreading, which instead can result in the cache and the database getting out of sync, and result in errors like the one you got above.

So now we spent a long time trying to switch to a faster database backend while at the same time maintaining the current release.

hbarsnes commented 6 years ago

Hi Chris,

I just properly released the new versions of SearchGUI and PeptideShaker. There are no changes compared to the snapshots I made available at Dropbox.

Given that the original request was for the addition of the now supported TMT + SILAC modification, I will now close this issue. So please open a new one if you experience other problems than the memory issues when using the new modifications.

I will let you know as soon as a beta version of the new major release becomes available for testing.

Best regards, Harald

chrishuges commented 5 years ago

Hi Harald,

Just wanted to update on this regarding the error I experienced above. I fed it more memory and it worked fine.

Cheers, Chris