Nesvilab / MSFragger

Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
https://msfragger.nesvilab.org
108 stars 7 forks source link

_rankN pepXML files #334

Open chambm opened 2 weeks ago

chambm commented 2 weeks ago

Why does MSFragger in DIA mode write pepXML files with rank suffixes instead of using the hit_rank="N" attribute in a single pepXML? Is there an option to always write the DDA way?

fcyu commented 2 weeks ago

Two major reasons

  1. AFAIK, each spectrum_query can only have one precursor_neutral_mass and assumed_charge, which is not suitable for DIA
  2. Some downstream tools, such as PeptideProphet and Philosopher, don't read >1 ranked search_hit

Best,

Fengchao

chambm commented 2 weeks ago

True. Maybe multiple spectrum_query elements in that case? That is the way it's supposed to work for assumed_charge in the chimeric DDA case.

fcyu commented 2 weeks ago

Yes, multiple spectrum_query will work if the spectrum, spectrumNativeID, and start_scan can be non-unique.

But then, how to specify the ranks of the same spectrum?

Best,

Fengchao

chambm commented 2 weeks ago

Yes, because of the assumed_charge thing they don't have to be unique. The ranks would be specific to a hypothetical precursor (theoretical mass and charge).

fcyu commented 2 weeks ago

Sorry that my previous question is not clear. How to specify the rank 1, 2, 3 for the search_hit of the same spectrum if list them in a separated spectrum_query? Can the hit_rank starts with > 1?

Thanks,

Fengchao

chambm commented 2 weeks ago

Each spectrum_query should start with hit_rank=1. Think of the rank as being for the hypothetical precursor ion rather than for the spectrum. Isn't that how it already works with the _rank1, _rank2 separate files?

fcyu commented 2 weeks ago

Isn't that how it already works with the _rank1, _rank2 separate files?

In the _rank1, _rank2 files, we get the rank information by the file name. If we put all ranks in the same file and separate them in different spectrum_query, is there a way to mark different ranks? Or I have to rank them by the hyperscore when loading the data?

Thanks,

Fengchao

chambm commented 2 weeks ago

Yes, I suppose if you need to aggregate everything back at the spectrum level, you'll have to regenerate the ranks by whatever score you want to use. For percolator I'd expect to use its q-value to do the reranking so potentially something that was rank 2 will become rank 1.

fcyu commented 2 weeks ago

Thanks. Then, I need to make the changes and test if the downstream tools such as PeptideProphet and Philosopher support it.

Best,

Fengchao