dhmay / param-medic

Param-Medic breathes new life into MS/MS database searches by optimizing parameter settings to your data.
7 stars 1 forks source link

oddities in output #6

Open kevinkovalchik opened 4 years ago

kevinkovalchik commented 4 years ago

Hello,

Thanks for making this tool! I am finding it useful and am planning to use it in a large-scale reanalysis of published data to avoid difficulties with missing/incomplete information on acquisition parameters.

I noticed something that seems odd about the output and am wondering if you can help clarify it. Here are the details of an analysis of some data from a sciex triple tof:

Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 189
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 189
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 189
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 189

All these numbers make sense to me except and also with sufficient in-common fragments:, which is exactly the same for each charge state. Is this expected?

Also, when I run the same file and specify --charges 2 then this is the output:

Details:
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 170

The numbers match charge 2 from above except now sufficient in-common fragments is different. Is this expected?

Also, I'm aware that I'm seeing these detail reports because there are not enough paired spectra to do the analysis. But I would still like to understand the output here.

Best, Kevin

dhmay commented 4 years ago

Looks like you found a bug in errorcalc.py, on line 254. As you noted, it appears to be giving you the same number of spectra that it's able to use for every charge. What it's actually reporting is the total number of usable spectra across all charges.

I believe I could fix the bug very easily by changing line 254 to report len(percharge_calculator.paired_fragment_peaks) instead of len(precursor_distances_ppm)

However, it's been a long time since I looked at this code, and I'm a little nervous about screwing it up. So, two options for you:

  1. As you noticed, if you restrict to a single charge, you'll get a different number than if you run all charges. That number is, in fact, correct for that charge. So, if you want those numbers, you can run them separately for each charge and sum them up.
  2. You could try implementing the fix I suggested above. If you do, please make a pull request!

I'll try to get around to fixing it, but verifying the fix would take me far longer than making it. If I made the fix on a branch, would you be willing check out the branch and verify it for me? If so, I'll update this issue when it's done on a branch.

kevinkovalchik commented 4 years ago

Thanks for the quick response. Hm... that might be the fix. I changed that line and here is the output:

2020-04-22 14:42:10,086 INFO: Need >= 200 peak pairs to fit mixed distribution. Got only 189.
Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 20
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 850
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 85
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 10

which looks more reasonable. But the largest reported number there is 850 which is not the number of peak pairs, 189. Is that because 850 represents the total number of paired spectra, not the number of peak pairs?

dhmay commented 4 years ago

Ha, that's what I get for trying to barge back into code I haven't looked at in years. I gave you the wrong variable to plug in there. Try it with len(percharge_calculator.paired_precursor_mzs).

kevinkovalchik commented 4 years ago

Haha. At least the code is nice and readable! I'll give that a try.

On Wed, Apr 22, 2020 at 4:25 PM Damon May notifications@github.com wrote:

Ha, that's what I get for trying to barge back into code I haven't looked at in years. I gave you the wrong variable to plug in there. Try it with len(percharge_calculator.paired_precursor_mzs).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dhmay/param-medic/issues/6#issuecomment-618020119, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2PTUDDTESSBBQJOUHEJTLRN5HB5ANCNFSM4MOMWVVA .

kevinkovalchik commented 4 years ago

Okay, this looks good now! Here is the output this time:

2020-04-23 09:04:26,381 INFO: Need >= 200 peak pairs to fit mixed distribution. Got only 189.
Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 4
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 170
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 17
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 2

The numbers for charge 2, 3 and 4 add up to the reported number of peak pairs (189). Charge 0 doesn't seem to be contributing to the number of peak pairs. Are unknown charges not used?