UWPR / Comet

An tandem mass spectrometry (MS/MS) sequence database search tool.
https://uwpr.github.io/Comet/
Apache License 2.0
45 stars 13 forks source link

New versions of Comet run significantly slower #53

Closed Wang-kaifei closed 7 months ago

Wang-kaifei commented 7 months ago

Dear developers,

I am an old user of comet, and I was using the 2021 version of the software. I recently updated to the latest version, but found that under the same search parameters, the running time of the software increased by 15 times or more. So I wonder if this phenomenon is normal?

Any reply will be appreciated.

Best, Kaifei Wang

jke000 commented 7 months ago

Hi. What you're describing is not normal. The latest Comet has been out for awhile and you're the first to report such an issue with it. I just ran a quick test, a basic tryptic search, comparing Comet 2023.01.2 (latest version) and a 2021.02.0 and both ran the search at the same time.

To start debugging this mystery, can you send me the search parameters (comet.params) that you used for both your 2021 and 2023 searches? I'll see if I can replicate what you're observing. Go ahead and email those to me at engj@uw.edu.

Jimmy

Wang-kaifei commented 7 months ago

Dear Jimmy,

Thank you for your quick reply! I have emailed the relevant files, please check.

Best, Kaifei Wang

jke000 commented 7 months ago

Hi Kaifei. I was able to take a look at your files this morning and see the issue is associated with the variable modification entry

variable_mod01 = 42.010565 nABCDEFGHIJKLMNOPQRSTUVWXYZ 0 3 -1 0 0 0.0

There was a bug in Comet, prior to the 2023 release, that would not correctly parse the modification residues if the amino acid list was longer than 19 and your list of modified residues is 27. So the issue isn't that the latest Comet is slower but rather the prior versions of Comet were not processing this extensive variable modification, thus running much faster. If you reduce that modification string to 19 characters or less, say nACDEFGHIKLMNPQRSTV, then you'll see the correct run time for Comet 2021 (which is much slower than if this modification were not present).

This is noted in the release notes for release 2023.01.0: "Fix bug where Comet fails to analyze a variable modification if more than 19 residues are specified for that mod. Thanks to D. Tabb for reporting the issue." I'm going to close this issue with this comment but feel free to reply if this doesn't adequately address your question. Thanks again for doing science!

Wang-kaifei commented 7 months ago

Dear Jimmy,

Thank you so much for the detailed answer!!!! I have understood the reason for this phenomenon.

All the best to you.

Wang-kaifei commented 6 months ago

Dear Jimmy,

Sorry to interrupt again. This modification leads to a lot of additional analysis time, I'm unsure if this modification is set correctly by me. So I would like you to review this write-up: variable_mod01 = 42.010565 nABCDEFGHIJKLMNOPQRSTUVWXYZ 0 3 -1 0 0 0.0

In fact, the modification I want to set is "protein N-terminal acetylation". It means that a modification of mass 42.010565 occurs at the N-most amino acid of the protein.

Looking forward to your reply very much, thanks!

-Kaifei Wang

jke000 commented 6 months ago

For protein n-terminal acetylation, use: variable_mod01 = 42.010565 n 0 1 0 0 0 0.0 Thi variable modification help/usage page hopefully explains the awful parameter encoding. But if there are any questions, feel free to follow-up here.

Wang-kaifei commented 6 months ago

Thank you very much, I will try it!