SimpleNumber / aa_stat

AA_stat tool is for searching uncovering the unexpected modifications of amino acid residues in the protein sequences, as well as possible artifacts of data acquisition or processing, in the results of proteome analyses.
Other
6 stars 5 forks source link

StopIteration at Plotting mass shift figures... step #13

Closed alessandro-vai closed 1 year ago

alessandro-vai commented 1 year ago

Hi,

I am running AA_stat on open search results of histone dataset. For this reason, I have fewer PSMs than I would get with a total proteome experiment, so the warning is perfectly fine. However, the tool fails at the "Plotting mass shift figures" step, producing the png plot files up to the mass shift -4.0308. Don't know if you need input files for the debugging.

Thank you in advance.

 INFO: [17:41:09] Starting...
    INFO: [17:41:09] Using fixed modifications: +119.0371 @ N-term.
    INFO: [17:41:09] Variable modifications in search results: 56.026215 @ K, 10.008269 @ R, 42.010565 @ K, 70.041865 @ K, 14.01565 @ R, 28.0313 @ K, 28.0313 @ R, 42.04695 @ K.
    INFO: [17:41:09] Reading input files...
 WARNING: [17:41:09] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
 WARNING: [17:41:10] Skipping mass calibration: not enough peptides near zero mass shift.
    INFO: [17:41:10] Starting analysis...
    INFO: [17:41:10] Performing Gaussian fit...
    INFO: [17:41:24] Discarding bad peaks...
    INFO: [17:41:24] Joined mass shifts ['1.0000', '1.0007']
    INFO: [17:41:24] Peaks for subsequent analysis: 58
    INFO: [17:41:24] Performing group-wise FDR filtering...
    INFO: [17:41:25] # of filtered mass shifts = 58
    INFO: [17:41:25] Systematic mass shift equals -0.0008
    INFO: [17:41:25] Calculating distributions...
    INFO: [17:41:25] Mass shifts:
    INFO: [17:41:25] -94.0413 Da
    INFO: [17:41:25] -89.0269 Da
    INFO: [17:41:25] -48.0178 Da
    INFO: [17:41:25] -48.0015 Da
    INFO: [17:41:25] -43.0523 Da
    INFO: [17:41:25] -32.0058 Da
    INFO: [17:41:25] -30.0103 Da
    INFO: [17:41:25] -20.0058 Da
    INFO: [17:41:26] -18.0243 Da
    INFO: [17:41:26] -18.0099 Da
    INFO: [17:41:26] -10.0306 Da
    INFO: [17:41:26] -6.9840 Da
    INFO: [17:41:26] -6.0282 Da
    INFO: [17:41:26] -5.9887 Da
    INFO: [17:41:26] -4.0308 Da
    INFO: [17:41:26] -2.0722 Da
    INFO: [17:41:26] -2.0143 Da
    INFO: [17:41:26] -1.9921 Da
    INFO: [17:41:26] -0.9841 Da
    INFO: [17:41:26] +0.0000 Da
    INFO: [17:41:27] +0.9847 Da
    INFO: [17:41:27] +1.0008 Da
    INFO: [17:41:27] +4.9798 Da
    INFO: [17:41:27] +7.0313 Da
    INFO: [17:41:27] +12.0010 Da
    INFO: [17:41:27] +13.9810 Da
    INFO: [17:41:27] +14.0155 Da
    INFO: [17:41:27] +14.9637 Da
    INFO: [17:41:27] +15.9955 Da
    INFO: [17:41:27] +17.0236 Da
    INFO: [17:41:27] +21.9792 Da
    INFO: [17:41:28] +26.0174 Da
    INFO: [17:41:28] +27.9950 Da
    INFO: [17:41:28] +28.0316 Da
    INFO: [17:41:28] +31.9905 Da
    INFO: [17:41:28] +37.0608 Da
    INFO: [17:41:28] +37.9441 Da
    INFO: [17:41:28] +42.0121 Da
    INFO: [17:41:28] +43.9916 Da
    INFO: [17:41:28] +44.0043 Da
    INFO: [17:41:28] +51.0796 Da
    INFO: [17:41:28] +53.9157 Da
    INFO: [17:41:28] +56.0265 Da
    INFO: [17:41:29] +58.0068 Da
    INFO: [17:41:29] +61.9925 Da
    INFO: [17:41:29] +68.0608 Da
    INFO: [17:41:29] +70.0063 Da
    INFO: [17:41:29] +70.0421 Da
    INFO: [17:41:29] +72.0221 Da
    INFO: [17:41:29] +79.9576 Da
    INFO: [17:41:29] +86.0013 Da
    INFO: [17:41:29] +93.9720 Da
    INFO: [17:41:29] +107.0708 Da
    INFO: [17:41:29] +109.0491 Da
    INFO: [17:41:30] +112.0528 Da
    INFO: [17:41:30] +119.0376 Da
    INFO: [17:41:30] +119.0576 Da
    INFO: [17:41:30] +156.1020 Da
    INFO: [17:41:31] Summary histogram saved.
    INFO: [17:41:50] Starting Localization using MS/MS spectra...
    INFO: [17:41:50] Reference mass shift +0.0000
    INFO: [17:41:50] Localizing -94.0413...
    INFO: [17:41:50] Localizing -89.0269...
    INFO: [17:41:50] Localizing -48.0178...
    INFO: [17:41:50] Localizing -48.0015...
    INFO: [17:41:50] Localizing -43.0523...
    INFO: [17:41:51] Localizing -32.0058...
    INFO: [17:41:51] Localizing -30.0103...
    INFO: [17:41:51] Localizing -20.0058...
    INFO: [17:41:51] Localizing -18.0243...
    INFO: [17:41:51] Localizing -18.0099...
    INFO: [17:41:51] Localizing -10.0306...
    INFO: [17:41:51] Localizing -6.9840...
    INFO: [17:41:51] Localizing -6.0282...
    INFO: [17:41:51] Localizing -5.9887...
    INFO: [17:41:51] Localizing -4.0308...
    INFO: [17:41:52] Localizing -2.0722...
    INFO: [17:41:52] Localizing -2.0143...
    INFO: [17:41:52] Localizing -1.9921...
    INFO: [17:41:52] Localizing -0.9841...
    INFO: [17:41:52] Localizing +0.0000...
    INFO: [17:41:52] Localizing +0.9847...
    INFO: [17:41:52] Localizing +1.0008...
    INFO: [17:41:52] Localizing +4.9798...
    INFO: [17:41:52] Localizing +7.0313...
    INFO: [17:41:53] Localizing +12.0010...
    INFO: [17:41:53] Localizing +13.9810...
    INFO: [17:41:54] Localizing +14.0155...
    INFO: [17:41:54] Localizing +14.9637...
    INFO: [17:41:54] Localizing +15.9955...
    INFO: [17:41:54] Localizing +17.0236...
    INFO: [17:41:54] Localizing +21.9792...
    INFO: [17:41:54] Localizing +26.0174...
    INFO: [17:41:55] Localizing +27.9950...
    INFO: [17:41:56] Localizing +28.0316...
    INFO: [17:41:57] Localizing +31.9905...
    INFO: [17:41:57] Localizing +37.0608...
    INFO: [17:41:57] Localizing +37.9441...
    INFO: [17:41:57] Localizing +42.0121...
    INFO: [17:41:58] Localizing +43.9916...
    INFO: [17:41:59] Localizing +44.0043...
    INFO: [17:41:59] Localizing +51.0796...
    INFO: [17:41:59] Localizing +53.9157...
    INFO: [17:41:59] Localizing +56.0265...
    INFO: [17:41:59] Localizing +58.0068...
    INFO: [17:42:00] Localizing +61.9925...
    INFO: [17:42:00] Localizing +68.0608...
    INFO: [17:42:00] Localizing +70.0063...
    INFO: [17:42:01] Localizing +70.0421...
    INFO: [17:42:01] Localizing +72.0221...
    INFO: [17:42:02] Localizing +79.9576...
    INFO: [17:42:02] Localizing +86.0013...
    INFO: [17:42:02] Localizing +93.9720...
    INFO: [17:42:02] Localizing +107.0708...
    INFO: [17:42:02] Localizing +109.0491...
    INFO: [17:42:03] Localizing +112.0528...
    INFO: [17:42:03] Localizing +119.0376...
    INFO: [17:42:03] Localizing +119.0576...
    INFO: [17:42:03] Localizing +156.1020...
    INFO: [17:42:03] Plotting mass shift figures...
Traceback (most recent call last):
  File "/micromamba/envs/aa_stat/bin/AA_stat", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/AA_stat/main.py", line 55, in main
    AA_stat.AA_stat(params_dict, args)
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/AA_stat/AA_stat.py", line 444, in AA_stat
    stats.plot_figure(ms_label, *data, params_dict, save_directory, localizations, sumof)
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/AA_stat/stats.py", line 391, in plot_figure
    ax_left.bar(x - b, distributions.loc[labels],
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/__init__.py", line 1442, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 2510, in bar
    errorbar = self.errorbar(ex, ey,
               ^^^^^^^^^^^^^^^^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/__init__.py", line 1442, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 3534, in errorbar
    yerr = _upcast_err(yerr)
           ^^^^^^^^^^^^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 3516, in _upcast_err
    isinstance(cbook._safe_first_finite(err), np.ndarray)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/micromamba/envs/aa_stat/lib/python3.11/site-packages/matplotlib/cbook/__init__.py", line 1715, in _safe_first_finite
    return next(val for val in obj if safe_isfinite(val))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration
levitsky commented 1 year ago

Hi @alessandro-vai, thank you for reporting. The point at which the error is happening suggests that there is an anomaly in the frequency distributions for that mass shift, most probably resulting in an abnormal value in error estimation. It's hard to guess what it is, and I can imagine that sharing all input data may not be very convenient, so I added a debug output at the right place which could help.

If you can update to the latest commit and run AA_stat again with -v 2 added to the command line, the debug lines before the error (let's say for -4.0308 Da and -2.0722 Da figures) would be really helpful.

alessandro-vai commented 1 year ago

Thanks for the prompt reply. Here it is the log. log_AA_stat.txt

levitsky commented 1 year ago

Thank you! It looks like the error is happening when all of the error estimates for amino acid frequencies are undefined. When only some are, it's fine. I made a change to replace all NaN values with zero, which should not matter for the other cases. Can you please check the latest master?

Also, I notice that you seem to have very few decoys in the input. Is it filtered? AA_stat is designed to work with unfiltered data, so if possible, you can try that and see if the results improve.

alessandro-vai commented 1 year ago

It works! I guess the fact histones yield to small datasets causes troubles to the some fitting procedures. I noticed the very low number of decoys, but probably it is just right. I will double check the input files. Cheers