ProteomicsLFQ: Search settings are not matching across IdentificationRuns

JustinGibbons commented 1 year ago

Hi,

I"m trying to re-create this workflow using snakemake and OpenMS version 2.9.1: https://hub.knime.com/openms-team/spaces/Blog%20workflows/latest/Protein%20label-free%20quantification~Q8TgAl_OAuerfhES

It appears to be working until the ProteomicsLFQ step were I get this error:

Searchengine settings or modifications from IDRun OpenMS/ConsensusID_PEPMatrix_2023-05-01T15:07:15_11148416635320453930_2F1 do not match with the others. You probably do not want to merge the results with this tool. For merging searches with different engines/settings please use ConsensusID or PercolatorAdapter to create a comparable score.

I'm using comet and MSGFPlus for the peptide spectrum identification and run those results through Percolator to get the PEP and ConsensusID to merge the results. Are there specific settings I have to use in PercolatorAdapter or ConsensusID to get this to work?

I've attached the snakemake file if that's helpful.

Is this something you can help me with?

Thank you

copy_snakefile.txt

jpfeuffer commented 1 year ago

Hi, you would need to check the beginning of the ConsensusID output files (idXML). They should have an XML element for the SearchEngineSettings in the beginning. If they are not the same, PLFQ will fail. This might happen if the input file order to Consensus ID changes for different raw files and the settings in the search engines are not exactly the same.

JustinGibbons commented 1 year ago

Thank you @jpfeuffer.

The issue is the reference database is different for each sample. I'm running proteomics on fecal samples that should be a mix of human and microbial proteins. I have the metagenomes corresponding to each sample and so was able to create a unique reference for each sample.

I was able to get ProteomicsLFQ to run by manually changing the reference database value to look like it was the same for all samples. Is there are way to automate this using OpenMS or is this inadvisable? I expect most of the proteins to be human and shared between the samples

jpfeuffer commented 1 year ago

Ah yes that explains it. You can probably avoid it by renaming the db to a common name shortly before it goes into the search engines.

The differences in proteins between samples can have some implications during protein inference. I think our algorithms make the assumption that a peptide always comes from the same set of proteins in every sample.

OpenMS / OpenMS

ProteomicsLFQ: Search settings are not matching across IdentificationRuns #6846