compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
47 stars 19 forks source link

make SG/PS deal better with non standard headers #452

Closed bernt-matthias closed 3 years ago

bernt-matthias commented 3 years ago

@hbarsnes since I see users running into https://github.com/compomics/peptide-shaker/issues/447 (wrong header format) all the time I'm wondering if there is a better solution.

For instance one could provide a flag that makes SG/PS to use the complete header as ID. Or to allow the user to provide a regex or format string for the header.

Alternatively one could do this also automatically:

hbarsnes commented 3 years ago

The second option (i.e. using the whole header if the format is not recognized) is already implemented and in use (see https://github.com/compomics/compomics-utilities/blob/master/src/main/java/com/compomics/util/experiment/io/biology/protein/Header.java). The problem occurs when the custom headers have almost the same format as a standard header but does not contain the exact same content. I guess this could be further improved by working on the regular expressions we use to parse the headers, but not sure how easy it would be to guarantee that issues cannot occur.

Adding a flag to override the parsing and instead always use the complete header should be possible to implement, as would a more advanced option of allowing the users to provide a regular expression. Or perhaps the two can simply be combined into one option. We initially stayed away from this as most of our users would have trouble providing the correct regular expression. At least that is my experience from when we used to have Mascot server. But perhaps it makes sense to add it as an advanced option.

bernt-matthias commented 3 years ago

Wondeful, I forwarded the info to the user. I guess easiest is then to replace or remove |.