Nesvilab / PTM-Shepherd

A tool for summarizing open search results
http://ptmshepherd.nesvilab.org
Apache License 2.0
14 stars 5 forks source link

Fail after localization annotation #5

Closed StSchulze closed 4 years ago

StSchulze commented 4 years ago

Hi,

I have tested PTM-Shepherd so far on results from single MS runs and it worked fine. However, I now wanted to perform a combined analysis on results from multiple MS runs (multiple fractions) and it fails after the localization annotation step.

Here is the printout:

PTM-Shepherd version 0.3.4(c) University of Michigan

Using Java 1.8.0_221 on 10923MB memory

Deleted file: E:\open_mod_search\human2\peaks.tsv Deleted file: E:\open_mod_search\human2\peaksummary.annotated.tsv Deleted file: E:\open_mod_search\human2\peaksummary.tsv Deleted file: E:\open_mod_search\human2\combined.tsv Deleted file: E:\open_mod_search\human2\01.histo Deleted file: E:\open_mod_search\human2\01.ms2counts Deleted file: E:\open_mod_search\human2\01.rawlocalize Deleted file: E:\open_mod_search\human2\combined.histo Deleted file: E:\open_mod_search\human2\global.modsummary.tsv Counting MS2 scans for dataset 01 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_24 - 46401 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_23 - 45340 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_26 - 46487 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_25 - 44845 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_20 - 47533 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_22 - 46714 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_21 - 47609 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_39 - 48111 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_17 - 26631 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_16 - 26348 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_38 - 7685 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_19 - 45239 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_18 - 26392 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_35 - 45483 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_13 - 25589 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_2 - 10384 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_34 - 43143 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_12 - 24647 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_1 - 3824 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_37 - 44751 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_15 - 24356 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_36 - 44019 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_14 - 25118 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_31 - 41125 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_30 - 42996 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_11 - 23286 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_33 - 42358 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_32 - 42562 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_10 - 23545 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_9 - 23939 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_8 - 21002 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_7 - 16392 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_6 - 19472 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_5 - 18688 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_4 - 14027 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_3 - 2353 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_28 - 39125 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_27 - 44084 scans 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_29 - 41001 scans 1252604 MS2 scans present in dataset 01

Creating combined histogram Generated histogram file for dataset 01 [-359 - 7964] Created combined histogram!

Running peak picking

Picked top 5000 peaks

created summary table

annotated summary table

created modification summary Begin localization annotation 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_24 - 1732 (5683 ms, 530 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_23 - 1458 (5984 ms, 236 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_26 - 1728 (5445 ms, 238 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_25 - 1345 (5401 ms, 183 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_20 - 1232 (5440 ms, 172 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_22 - 1273 (5455 ms, 174 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_21 - 1169 (5319 ms, 137 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_39 - 2461 (6374 ms, 483 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_17 - 793 (3055 ms, 100 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_16 - 699 (3092 ms, 91 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_38 - 152 (2361 ms, 19 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_19 - 1114 (4657 ms, 121 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_18 - 791 (3150 ms, 100 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_35 - 1656 (5298 ms, 349 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_13 - 394 (3055 ms, 50 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_2 - 21 (2706 ms, 3 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_34 - 1721 (5505 ms, 242 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_12 - 328 (3034 ms, 40 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_1 - 1 (2087 ms, 0 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_37 - 1866 (5500 ms, 277 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_15 - 487 (3248 ms, 69 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_36 - 1865 (5325 ms, 271 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_14 - 434 (2909 ms, 54 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_31 - 1318 (5174 ms, 181 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_30 - 1876 (5420 ms, 383 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_11 - 303 (2953 ms, 33 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_33 - 1884 (5548 ms, 260 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_32 - 1521 (5291 ms, 227 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_10 - 226 (3082 ms, 29 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_9 - 275 (2895 ms, 32 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_8 - 162 (2686 ms, 17 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_7 - 127 (2376 ms, 13 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_6 - 113 (2558 ms, 14 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_5 - 45 (2685 ms, 5 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_4 - 35 (2498 ms, 4 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_3 - 5 (2003 ms, 1 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_28 - 1686 (5050 ms, 236 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_27 - 1679 (5273 ms, 204 ms) 20150708_QE3_UPLC8_DBJ_QC_HELA_39frac_GluC_29 - 1726 (4965 ms, 231 ms) done

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at edu.umich.andykong.ptmshepherd.core.FastLocator.getIndex(FastLocator.java:61) at edu.umich.andykong.ptmshepherd.localization.SiteLocalization.updateLocalizationProfiles(SiteLocalization.java:219) at edu.umich.andykong.ptmshepherd.PTMShepherd.main(PTMShepherd.java:378)

I attached the config file as well as the input .tsv as well. Let me know if you need the mzMLs as well, I can upload them on a file sharing platform. ptmshephered.zip

danielgeiszler commented 4 years ago

Hi @StSchulze

PTM-Shepherd does not anticipate anyone doing searches wiith a mass window as large as -350 - 8000 Da masses.

I can fix this and reupload tonight, but just to confirm before I do, are you trying to search with a mass window this large? Only 0.5% of your spectra are above the 4000 Da, so they're likely to be noise.

If you reduce the mass range to be < 4950 Da in length (-350 to 4000 Da or so), it should work with the current version.

StSchulze commented 4 years ago

Hi @danielgeiszler,

thanks for the quick response. Yes, that mass range is higher than usual since I was testing a few things, but it is not really useful, so don't worry about it. It works with the reduced mass range, so it's perfectly fine.

Quick question that is not really related: where do I actually find the position of the localized mods? If I understand it correctly, in the "rawlocalize" file, if MaxPeaks_Loc > MaxPeaks_Unloc then it counts as localized PSM, but I am not sure where the position/amino acid of that localization is given? Is it any of the uppercase letters in "Localized_Pep"?

danielgeiszler commented 4 years ago

Perfect!

Yes, the uppercase positions l all scored equally when placing the modification on them.

The algorithm we use for localization is essentially the same as the one used by MSFragger, so if you turned on the localize function there the data can be found in {RAW_FILE}.tsv. It’s better annotated there since those files are intended to be end-user products, whereas the PTM-S localize/simrt raw files are primarily intended as intermediate files (but we make them available).

On Jul 17, 2020, at 12:01 AM, StSchulze notifications@github.com wrote:

 Hi @danielgeiszler,

thanks for the quick response. Yes, that mass range is higher than usual since I was testing a few things, but it is not really useful, so don't worry about it. It works with the reduced mass range, so it's perfectly fine.

Quick question that is not really related: where do I actually find the position of the localized mods? If I understand it correctly, in the "rawlocalize" file, if MaxPeaks_Loc > MaxPeaks_Unloc then it counts as localized PSM, but I am not sure where the position/amino acid of that localization is given? Is it any of the uppercase letters in "Localized_Pep"?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

StSchulze commented 4 years ago

Great, thanks again for the help! I'll close this here.