SimpleNumber / aa_stat

AA_stat tool is for searching uncovering the unexpected modifications of amino acid residues in the protein sequences, as well as possible artifacts of data acquisition or processing, in the results of proteome analyses.
Other
6 stars 5 forks source link

Error with AA_Stats #8

Closed BenSamy2020 closed 2 years ago

BenSamy2020 commented 2 years ago

Greetings @levitsky,

I am currently analysing open search results from proteomics files from PXD008443. I am consistently experience the error indicated below. Please do advise me on how to troubleshoot. Thank you.

Error Message:

C:\Users\parth>AA_stat --dir D:\MSFragger_Temp\PXD008443\AA_STATS\aa_stats_output --pepxml D:\MSFragger_Temp\PXD008443\AA_STATS\pepxml --mgf D:\MSFragger_Temp\PXD008443\AA_STATS\mgf INFO: [11:18:55] Starting... INFO: [11:18:55] Using default parameters for AA_stat. INFO: [11:18:56] Using fixed modifications: +57.0215 @ C. INFO: [11:18:56] Variable modifications in search results: 15.9949 @ M, 42.0106 @ Protein N-term. INFO: [11:18:56] Reading input files... Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. Skipping mass calibration: not enough peptides near zero mass shift. INFO: [11:20:49] Starting analysis... INFO: [11:20:53] Performing Gaussian fit... c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\stats.py:222: RuntimeWarning: invalid value encountered in sqrt perr = np.sqrt(np.diag(pcov)) c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\scipy\optimize\minpack.py:833: OptimizeWarning: Covariance of the parameters could not be estimated warnings.warn('Covariance of the parameters could not be estimated', c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\scipy\optimize\minpack.py:833: OptimizeWarning: Covariance of the parameters could not be estimated warnings.warn('Covariance of the parameters could not be estimated', INFO: [11:23:09] Discarding bad peaks... INFO: [11:23:09] Joined mass shifts ['-112.9794', '-112.9682', '-112.9560'] INFO: [11:23:09] Joined mass shifts ['-99.0757', '-99.0633'] INFO: [11:23:09] Joined mass shifts ['-59.1536', '-59.1530'] INFO: [11:23:09] Joined mass shifts ['-57.0261', '-57.0116'] INFO: [11:23:09] Joined mass shifts ['-56.1408', '-56.1409'] INFO: [11:23:09] Joined mass shifts ['16.8959', '16.8963'] INFO: [11:23:09] Joined mass shifts ['36.0410', '36.0499', '36.0637'] INFO: [11:23:09] Joined mass shifts ['70.1016', '70.1022'] INFO: [11:23:09] Joined mass shifts ['72.0228', '72.0352'] INFO: [11:23:09] Joined mass shifts ['76.2075', '76.2082'] INFO: [11:23:09] Joined mass shifts ['77.9175', '77.9185'] INFO: [11:23:09] Joined mass shifts ['147.2085', '147.2086'] INFO: [11:23:09] Joined mass shifts ['176.1005', '176.1078'] INFO: [11:23:09] Joined mass shifts ['293.1580', '293.1667'] INFO: [11:23:09] Joined mass shifts ['305.1773', '305.1870'] INFO: [11:23:09] Peaks for subsequent analysis: 534 INFO: [11:23:09] Performing group-wise FDR filtering... INFO: [11:23:36] # of filtered mass shifts = 13 INFO: [11:23:36] Systematic mass shift equals -0.0022 INFO: [11:23:36] Calculating distributions... INFO: [11:23:36] Mass shifts: INFO: [11:23:36] -11.0329 Da INFO: [11:23:36] +0.0000 Da INFO: [11:23:36] +26.0162 Da INFO: [11:23:36] +100.0173 Da INFO: [11:23:36] +176.1101 Da INFO: [11:23:36] +213.1337 Da INFO: [11:23:36] +293.1601 Da INFO: [11:23:36] +302.1089 Da INFO: [11:23:36] +304.1622 Da INFO: [11:23:37] +304.2010 Da INFO: [11:23:37] +305.1796 Da INFO: [11:23:37] +320.1960 Da INFO: [11:23:37] +404.2112 Da INFO: [11:23:37] Summary histogram saved. INFO: [11:24:20] Starting Localization using MS/MS spectra... INFO: [11:24:20] Reference mass shift +0.0000 INFO: [11:24:20] Localizing -11.0329... Traceback (most recent call last): File "c:\users\parth\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\parth\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\parth\AppData\Local\Programs\Python\Python39\Scripts\AA_stat.exe__main__.py", line 7, in File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\main.py", line 43, in main AA_stat.AA_stat(params_dict, args) File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\AA_stat.py", line 417, in AA_stat localization_dict.update(localization.localization( File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\localization.py", line 416, in localization z = list(zip(df.apply(lambda x: localization_of_modification( File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pandas\core\frame.py", line 8740, in apply return op.apply() File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pandas\core\apply.py", line 688, in apply return self.apply_standard() File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pandas\core\apply.py", line 812, in apply_standard results, res_index = self.apply_series_generator() File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\pandas\core\apply.py", line 828, in apply_series_generator results[i] = self.f(v) File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\localization.py", line 416, in z = list(zip(df.apply(lambda x: localization_of_modification( File "c:\users\parth\appdata\local\programs\python\python39\lib\site-packages\AA_stat\localization.py", line 295, in localization_of_modification exp_dict = preprocess_spectrum(spectra_dict[row['file']], spectrum_id, {}, acc=params_dict['frag_acc'],) KeyError: '31OCT13_POSH_seg01_C8_42-5'

Regards, Ben

BenSamy2020 commented 2 years ago

Greeting @levitsky,

To update you. Once I changed my input files from mgf to mzML I was successfully able to complete AA_Stat analysis. I guess the issue might be the uncalibrated mgf file provided by MSFragger upon conservation from .RAW to .mgf. I was using that....

Maybe you want to put a note on your tutorial not to use the uncalibrated mgf file provided by MSFragger. Rather convert the .RAW file with MSCONVERT to .mzML

Regards, Ben

levitsky commented 2 years ago

Ah, thank you for the heads-up @BenSamy2020 ! If I understand correctly, you were applying MSFragger directly on RAW files? Indeed, this case is not covered in our docs, so we will update them.

Thank you and best regards, Lev

BenSamy2020 commented 2 years ago

Greetings @levitsky,

Yep I was directly applying .RAW files with MSFragger. Additionally, I am looking at modifications enriched for in the serum proteomics dataset from https://www.ebi.ac.uk/pride/archive/projects/PXD008443. This dataset had been labelled with iTRAQ 8-plex (+144.102 Mass Shift to lysine [K] and N-terminus).

Unfortunately, aa_stat is not able to pick you this modification. I am attaching here the AA_Stat report snap-shot generated for your reference. The modification of +304.2010 is recommended to be set as variable modification at the N_terminus and lysine [K] residue. Also the for fixed modification +293.1601 is recommended for cysteine [C].

Please do advise me if there is anything unusual or this is what is expected.

Regards, Ben

image

BenSamy2020 commented 2 years ago

Greeting @levitsky,

Additionally, I have sent you the HTML report to your email. GitHub interface is preventing me from sending the full report here.

Regards, Ben

levitsky commented 2 years ago

I may be missing something (not used to working with iTRAQ at all), but I think the result is mostly as expected. +144.102 is the mass shift for iTRAQ 4-plex, while AA_stat picked up the correct mass shift for iTRAQ 8-plex, as mentioned in the report (see the Possible interpretations column for +304 mass shift). I think 304.205360 is the correct mass to set for N-terminus and K. The fact that AA_stat also recommends it for S and T is not ideal, I will look into that. Also, you should be seeing figures in the HTML report if you open it locally in the AA_stat result folder, so that the adjacent PNG files can be loaded in your browser. I hope it works as expected for you, but the image below the table is missing in your screenshot. Let me know if you're having trouble seeing those images.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

Yes you are absolutely right about the +304.205360 mass shift on N-term and K. My apologies, I thought the mass shift for 8-plex in +144.102.

How about the +293.1601 fix modification recommendation on C? Also is this the image you are talking about?

image

Regards, Ben

BenSamy2020 commented 2 years ago

For the +293.1601 mass shift, its a AA substitution? Meaning I do not need to place at +57.0215 for C?

levitsky commented 2 years ago

Great! Yes, this is an example of the images I was talking about.

+293 on C is a bit weird, I can see from your initial report screenshot that cysteine's frequency is through the roof for +293 mass shift. However, this is not a very reliable estimate, because apparently there are very few peptides with cysteine in the reference mass shift. You should see a large error bar on the green bar for cysteine if you look at +293 mass shift (a large error bar is also seen for +304 for cysteine in the pic above). The automatic interpretation suggested by AA_stat for +293 does not seem sensible to me, I would ignore that.

To get the most information out of these data, I suggest that you try running the search with a variable modification for iTRAQ 8-plex on N-term and K. In this way, we should get more peptides with a zero mass shift, which will improve the results. Statistical significance of the calculated frequencies will be better, and interpretation of results will be easier. Plus, chances are you'll identify more peptides.

However, some preliminary guesses can be made by looking at the orange bars for zero mass shift and +293 mass shift, for example. Orange bars mean percentage of peptides, e.g. if the cysteine's orange bar is at 10% for mass shift zero, it means that out of 353 unmodified peptides ~35 contain cysteine. This way you can compare raw counts of cysteine-containing peptides between mass shift zero and +293 and decide for yourself if this just noise or something meaningful. You can also click on peptide counts in the table and see actual peptides, sometimes this helps find something interesting. But, I would really suggest you try to search with iTRAQ modification enabled before diving into this.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

When you said, "I would really suggest you try to search with iTRAQ modification enabled before diving into this." You mean, performing another open search with variable modification for iTRAQ 8-plex on N-term and K. There after perform another AA_Stat analysis?

If that what you mean, I actually in the process of doing it =)

Regards, Ben

levitsky commented 2 years ago

Yes, that's exactly what I mean!

BenSamy2020 commented 2 years ago

Greetings @levitsky,

I just performed the AA_Stat analysis using open search with variable modification for iTRAQ 8-plex on N-term and K only only. I have attatched the data here for your reference. Is it alright that you look at the aa_stat analysis results and provide me with a brief feedback on which fixed and variable modifications you would use?

I would like to compare my interpretation with yours. Lastly, thank you for provided this amazing open source tool!

Regards, Ben aa_stats_output.zip

BenSamy2020 commented 2 years ago

Actually, I am a little confused with the AA_Stats report being generated. AA_Stats should be recommending me to incorporate variable modification of +304.2010 for N-term and K residue. But it is recommending me double modification for N-term (+304.2007 and +26.0163) and did not recommend me any modification for K residue.

image

levitsky commented 2 years ago

Hi Ben, I think what happened is you enabled +304 on the protein N-term instead of the peptide N-term. Hence a very abundant +304 mass shift still observed. Can you fix the search setting and try again?

Also, when I was suggesting to include this modification, I meant that you enable them in addition to what was already set; that's up to you, though: +16 on Met is definitely seen in the results now, but it's OK either way.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

Yep, I keyed in the wrong command.... Currently performing default MSFragger open search in addition with N-term peptide and K +304.205360 variable modifications. Will update you soon.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

I managed to successfully search the data. But I am facing an error while processing the input files with AA_Stats. Please refer to attatched image for error message. I think AA_Stats is not able to generate a summary file. Please do advise me on how to troubleshoot this.

Regards, Ben image

levitsky commented 2 years ago

Hi @BenSamy2020, I'm sorry that you're having this trouble.

Would you be able to share the search results with me? I don't know if all of them are needed or if you can reproduce the error with just one or a few of them; either is possible. I would like to be able to reproduce this error locally, though. mzML files I can probably generate myself from the public RAW files, so only pepXML files are needed.

If that's not something you can do, I can try to figure out what's happening from the detailed logs. This would require re-running AA_stat with --verbosity=3 and redirecting all output to a file with something like > aastat_log.txt 2>&1.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

I am using this command below but nothing is happening.. Not sure if I am typing the right command. My apologies...

AA_stat --dir D:\MSFragger_Temp\PXD008443\MSFragger_OpenSearch_Output\aa_stats_output --pepxml D:\MSFragger_Temp\PXD008443\MSFragger_OpenSearch_Output\pepxml --mzml F:\PRIDE_FTP\PXD008443\Without_AA_Stat_Refinement\AA_STATS\mzML --verbosity 3 D:\MSFragger_Temp\PXD008443\MSFragger_OpenSearch_Output\aa_stats_output > AA_Stat_Verbosity.txt 2>&1

levitsky commented 2 years ago

@BenSamy2020 this is close, but I think there's an extra argument in your command, between --verbosity 3 and the redirections you have an extra string D:\MSFragger_Temp\PXD008443\MSFragger_OpenSearch_Output\aa_stats_output which shouldn't be there I think? Any error messages related to this should be found in AA_Stat_Verbosity.txt though.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

The error still is present. image

BenSamy2020 commented 2 years ago

Also I just shared with you via email the pepXML files from my google drive. Please do advise me if you have access.

levitsky commented 2 years ago

Yes, I downloaded the files and will try to reproduce the error now, thank you!

P.S. The problem with your command now is that you deleted the > before AA_Stat_Verbosity.txt. But it's OK, hopefully I will reproduce the error on my machine and track it down. Thanks again for your help.

levitsky commented 2 years ago

Wow, that's a lot of data. I have not been able to run the whole set, but I tried with just a subset and I got the same error. I will look into fixing this.

levitsky commented 2 years ago

Thank you for your patience @BenSamy2020, I think I've identified and fixed the issue. Can you please try installing the latest version directly from Github? To do this, you can run something along the lines of:

pip install -U git+https://github.com/SimpleNumber/aa_stat

I looked at the results on the subset that I was using, and I noticed that the +57.02 on Cysteine was not the right setting. You will see that AA_stat recommends a fixed modification of -11.0315 on Cys. Combined with +57.0215 that's already set, it means that the correct fixed modification for Cys is actually +45.99. I looked it up and it's probably MMTS, which is included in the iTRAQ kits for cysteine reduction. So I suggest you run the search with +45.9877 on Cys instead of +57.02.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

Thank you for your support. I have managed to update my AA_Stat and perform the analysis. Based on aa_stat output it is recommending me various modification below (shared the file with you via google drive).

So for my close search setting should I incorporate:

1) Fix modification of +45.9879 on C 2) Variable modification of +15.9949 on M 3) Variable modification of +304.2054 on K 4) Variable modification of +42.0106 on Protein N-term 5) Variable modification of +404.2222 on Peptide N-term 5) Variable modification of +304.2040 on S 6) Variable modification of +0.9844 on N

Based on the graph observed for +0.0351, I am making the decision of not including the variable modification of +0.0351 on Q.

Regards, Ben

BenSamy2020 commented 2 years ago

Also please do advise me if you are able to access the aa_stat_output file via google drive. Thank you.

levitsky commented 2 years ago

Yep, I can see your output. One thing to note: AA_stat assumes by default that you can set multiple modifications on the same site (as indicated by the line in the log:

Recommending multiple modifications on same residue.

So, its recommendation to set +100 on N-term doesn't mean that you need to change +304 to +404 on N-term, rather that you can set both +304 and +100 as variable modifications on N-term. I definitely recommend keeping +304 on N-term, as for +100 it's up to you, it's only localized on ~100 peptides and it's not obvious how it affects overall results.

As for +304 mass shift that is still abundant in results, I'm not sure. It looks like +304 is often localized near the N-term, can it be that the label is attached twice somehow? Perhaps it makes sense to consider a double modification of +608 at peptide N-term? I don't know really. You can click on the number of peptides in the report and check the list of peptides to decide for yourself. You're right that +-0.03 are probably artefacts. It may be due to different systematic errors in different files, but AA_stat mass calibration cannot fix it due to insufficient number of unmodified peptides per file. It can possibly be addressed by tweaking AA_stat parameters, but if your main objective is to select the variable modifications, this is another story and it takes some time to dig into.

All in all, I can't recommend the best list of modifications for sure, but with guidance from AA_stat, I think something like this is a good start:

Consider that adding each variable modification slightly decreases overall specificity and results in less of all other reliable identifications. Ultimately the decision is yours and depends on your objective and situation.

BenSamy2020 commented 2 years ago

Greetings @levitsky,

With your recommendation and AA_Stat combined, I have able to discover additional 200 proteins from serum samples. This is actually a big achievement. I would really like to thank you for such an amazing tool. Just to let you know, I have incorporated your developed tool as a core facility pipeline to enhance protein detection and quantification from select facility projects.

What I can suggest to make things better would be, AA_stat provides the command below its modification recommendation. For example n^ 304.2054 or K 304.2054. Based on this users can directly copy and paste the modifications into FragPipe and run the searches.

Also I just realized that you have made a GUI too. Inexperience command users like me will definitely benefit from this. Lastly, not pressuring you. After you intending to publish this tool? I would like to cite my upcoming papers...

Regards, Ben

levitsky commented 2 years ago

Thank you @BenSamy2020 ,

I'm happy that you find AA_stat useful. We will consider your feedback and everything that has come up in this discussion to see what we can improve.

We actually recently published a paper describing AA_stat in Journal of Proteomics: https://doi.org/10.1016/j.jprot.2021.104350. There is also a preprint on biorXiv: https://www.biorxiv.org/content/10.1101/2020.09.07.286161v2.full. It wasn't mentioned on Github, so thank you for asking, I've added this information.