OpenMS / pyopenms-docs

pyOpenMS readthedocs documentation, additional utilities, addons, scripts, and examples.
https://pyopenms.readthedocs.io
Other
43 stars 53 forks source link

PyOpenMS getPrecursorSpectrum() #224

Closed oliveralka closed 2 years ago

oliveralka commented 2 years ago

Example: Precursor Purity https://github.com/OpenMS/pyopenms-extra/blob/master/docs/source/datastructures_peak.rst

Currently has a suboptimal way to get the precursor spectrum (this can fail e.g. for mzMLs with MSn).

https://github.com/OpenMS/OpenMS/blob/develop/src/openms/include/OpenMS/KERNEL/MSExperiment.h#L521 MSExperiment has a helper function, which can currently not easily be wrapped in pyopenms (at least to my understanding) ConstIterator getPrecursorSpectrum(ConstIterator iterator) const;

What would be the best way forward here?

timosachsenberg commented 2 years ago

I think simply wrapping it on the C++ side with:

int getPrecursorSpectrumIndex(size_t start_index) # return -1 if no precursor spectrum exists would probably be easiest

jpfeuffer commented 2 years ago

why not returning the spectrum? Do you think users will ever use the index for something else than getSpectrum(index)?

jpfeuffer commented 2 years ago

how do you get the index for the spectrum for which you want the precursor? I.e. start_index

timosachsenberg commented 2 years ago

why not returning the spectrum? Do you think users will ever use the index for something else than getSpectrum(index)?

I think the issue is how to deal with spectra that have no precursor spectrum

timosachsenberg commented 2 years ago

ok let's say we do a:

const MSSpectrum& getPrecursorSpectrum(const MSSpectrum& start)
{
  vector<Precursor>& pcs = start.getPrecursors();

  if (pcs.empty()) return MSSpectrum(); // return empty spectrum

  // referenced precursor spectrum native id
  if (pcs[0].metaValueExists("spectrum_ref")) // has reference to parent spectrum native id?
  {
      String pc_ref = precursor.getMetaValue("spectrum_ref");
      for (auto& s : *this)
      {
        if (s.getNativeID() == pc_ref) return s; // found precursor spectrum
      } 
  }

  const int ms_level = start.getMSLevel();
  if (it = std::find(this->begin(), this->end(), start); it != this->end())
  {
    do
    {
      --it;
      if (ms_level - it->getMSLevel() == 1)
      {
        return *it;
      }
    } while (it != spectra_.begin());
  }

  // not found
  return MSSpectrum();    
}
timosachsenberg commented 2 years ago

what would be the best way on python side to check if this doesn't exist? just check for .empty() ?

hroest commented 2 years ago

how do you get the index for the spectrum for which you want the precursor? I.e. start_index

I think that is problematic since when you have a spectrum, you often dont know the index in the experiment. Maybe it would be better to search by retention time or by native id?

int getPrecursorSpectrumByRT(double rt) # return -1 if no precursor spectrum exists
int getPrecursorSpectrumByID(String id) # return -1 if no precursor spectrum exists

so that you can do:

prec = exp.getSpectrum( getPrecursorSpectrumByRT( spec.getRT() ) )
timosachsenberg commented 2 years ago

I agree the: int getPrecursorSpectrumByID(String id) solution is by the way not so different to the const MSSpectrum& getPrecursorSpectrum(const MSSpectrum& start) in terms of run-time. Which one should I implement - opinions?

timosachsenberg commented 2 years ago

I went for now with the simplest solution of providing an index (this case arises frequently if data frames are constructed and one knows the index).

tapaswenipathak commented 2 years ago

Any documentation required here?