Closed elrubio closed 9 months ago
Can you have a go at this beta version of PeptideShaker to see it it solves the problem?
Thanks for your reply. Unfortunately, the error is also present in the beta version.
If you can share one of the files (plus the corresponding FASTA and spectrum files) I can try to reproduce the issue on my end and see if I can find the source of the issue.
I am no Java expert, but I am quite confident that I identified the responsible code. As visible from the stack trace, it's actually located in the compomics.util
library, so you could move the issue over there.
In MascotIdfileReader.getAllSpectrumMatches
, while iterating over PeptideAssumption
objects, the underlying collection is modified, resulting in a concurrency error (immediately and reliably, at least in my case). The only condition is that a peptide sequence contains non-standard amino acids, like X.
I uploaded my files anyways, just in case: Download
I've fixed the issue and updated the beta version: https://genesis.ugent.be/maven2/eu/isas/peptideshaker/PeptideShaker/3.0.0-beta/PeptideShaker-3.0.0-beta.zip
However, I came across a second issue which is the use of the "+" in the name of the mgf file. This causes problems later on when parsing the spectrum titles (as they include the file name). Would it be possible for you to rename the mgf files to remove the "+" (and redo the Mascot search)?
Thanks for the quick fix for the concurrency issue. After pushing my .dat
and .mgf
files through sed
to remove any plus characters in the spectrum titles, they are now correctly parsed. All looks good now. 👍
But since "+" is a perfectly valid character for both, file names and to (my knowledge) spectrum titles in mgf files, I actually consider this as a bug.
But since "+" is a perfectly valid character for both, file names and to (my knowledge) spectrum titles in mgf files, I actually consider this as a bug.
I agree that it is not optimal, but I think we added this decoding of the spectrum titles (from utf-8) as there were some cases where other Unicode characters were encoded in the titles which caused issues for some of the search engines. However, the "+" is sort of a special case that ideally ought to be ignored, i.e. not decoded to space, but not sure how easy that would be or whether this would be a solution that potentially results in other problems.
Anyway, great that you were able to work around this final issue.
Thank you for your feedback.
Unfortunately we have many existing MGF files. All of them potentially contain "+" characters in the TITLE line, many of them actually do. The current behaviour requires me to copy and pre-process all data files before using them in PeptideShaker. A permanent change would break linkage to existing ident file.
Considering the fact that url-endoded titles should be rather an exception than the rule, I opened this as an issue in the underlying library: https://github.com/compomics/compomics-utilities/issues/61. Maybe decoding could be made optional?
When trying to import Mascot results (without automatic decoy search) from Mascot DAT files, a java.util.ConcurrentModificationException is thrown consistently for all files.
Setting the number cores to 1 doesn't help.
Mascot version: 2.8.2
I can provide a sample file upon request.