OpenMS / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
477 stars 316 forks source link

PyOpenms SIRIUS #5045

Closed eeko-kon closed 3 years ago

eeko-kon commented 3 years ago

Hello!

I am trying to add Sirius to my AccurateMassSearchEngine output, to get fragmentation trees. There a lot of different classes though and I don't know which one to pick. I cannot access the manual for the pyopenms classes and I'm not experienced in c++. Could you help me? I have a mzTabFile() output from AccurateMassSearchEngine().

SiriusAdapterHit( SiriusAdapterRun( SiriusMzTabWriter( SiriusAdapterIdentification( SiriusMSFile(

oliveralka commented 3 years ago

Hi,

I think the AssayGeneratorMetabo give a good overview of how to use the SIRIUS classes in regard of processing. https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp

In general, you would need to use the featureXML of the AccurateMassSearchEngine output and mzML input together with SIRIUS.

Then you can load them: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L268 and use the preprocessing (feature mapping): https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L268 afterwards you the .ms file is generated for the SIRIUS call: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L393 Some checks: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L401 Here, SIRIUS is called: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L409 Here you extract the list of directories from SIRIUS: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L418 These can then be used to extract the Fragment Annotation for each spectrum: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AssayGeneratorMetabo.cpp#L425

You can also check the available wrapped functions based on the .pxd files: https://github.com/OpenMS/OpenMS/blob/develop/src/pyOpenMS/pxds/SiriusAdapterAlgorithm.pxd https://github.com/OpenMS/OpenMS/blob/develop/src/pyOpenMS/pxds/SiriusFragmentAnnotation.pxd

Let me know if that helped or if you have further questions.

What kind of output would you like to produce in general? This might also be possible by using the AssayGeneratorMetabo Tool.

Unfortunately, there will be a few changes in the next Version of OpenMS, due to changes introduced by the SIRIUS 4.4.29 dependency.https://github.com/OpenMS/OpenMS/pull/4725

eeko-kon commented 3 years ago

Hey Oliver,

Thank you for the quick response. Unfortunately, I am not experienced in programming and I can only work and understand python, so it is a bit difficult to read those c++ files. I will give it a try :) I am trying to tailor a workflow using pyopenms that basically goes as follows: FeatureFinderMetabo - MetaboliteFeatureDeconvolution -AccurateMassSearchEngine and then Sirius fragmentation tree predictions from the molecular formula matches of the mass search engine, followed by an MS2 database search.

The output of the AccurateMassSearchEngine is a mztab file though, not a featureXML. Right?

oliveralka commented 3 years ago

Hi @eeko-kon,

is there a specific reason you would like to script it?

You could probably do that up to the step of Sirius fragmentation trees using a simple bash script with AccurateMassSearch - MetaboliteAdductDecharger - AccurateMassSearch and AssayGeneratorMetabo. The AssayGeneratorMetabo gives out an assay library, which with the right parameters is similar to a spectral library. So you could parse the assay library in the format you want and perform spectral matching.

You can also have a look at the beginning of the DIAMetAlyzer workflow (KNIME), which might give you a better idea https://www.openms.de/comp/diametalyzer/

If you would like to use the python binding try to understand the snippets - It is mostly about what kind of classes and functions to use - just try it out - play around and let me know at which point you get stuck.

The output of the AccurateMassSearchEngine is a mztab file though, not a featureXML. Right?

Yes, but the changes can also be stored in a featureXML instead, which is used by the AssayGenerator/SiriusAdapter. Similar to here: https://github.com/OpenMS/OpenMS/blob/develop/src/utils/AccurateMassSearch.cpp#L155

    AccurateMassSearchEngine ams;
    ams.setParameters(ams_param);
    ams.init(); 

    FileTypes::Type filetype = FileHandler::getType(in); // check the filtype of you input file 

    if (filetype == FileTypes::FEATUREXML)
    {
      FeatureMap ms_feat_map;
      FeatureXMLFile().load(in, ms_feat_map);

      //-------------------------------------------------------------
      // do the work
      //-------------------------------------------------------------
      ams.run(ms_feat_map, mztab_output);

      //-------------------------------------------------------------
      // writing output
      //-------------------------------------------------------------
      // annotate output with data processing info
      //addDataProcessing_(ms_feat_map, getProcessingInfo_(DataProcessing::IDENTIFICATION_MAPPING));
      if (!file_ann.empty())
      {
        FeatureXMLFile().store(file_ann, ms_feat_map); // store the changes in the featureXML file. 
      }
eeko-kon commented 3 years ago

Hey Oliver. I am still in the process of building this workflow using your python wrappers and I am coming into a bit of a weird situation. For the proof of concept in my workflow, I am using standard samples (Germicidins, pentamycin, leupeptin). So my workflow is right now FeatureFinderMetabo and AccurateMassSearchEngine and I want to also implement SIRIUS, but before I do that, I am basically trying to see if it works independently from feeding SIRIUS the featureXML file and mzml file I am using in my workflow. So here are the issues so far:

1) When I am using the small molecules germicidins A and B, the FeatureXML file predicts the formula I want, but when I use leupeptin and pentamycin that are larger molecules, it doesn't. Would this be a problem for the SIRIUS predictions? I suppose so, right?

2) When I feed SIRIUS the featureXML file and mzml file from Germicidins, it misses the right formula and fragmentation trees. How is that possible, if the formula is already predicted in the featureXML that is generated from AccurateMassSearchEngine?

oliveralka commented 3 years ago

When I am using the small molecules germicidins A and B, the FeatureXML file predicts the formula I want, but when I use leupeptin and pentamycin that are larger molecules, it doesn't. Would this be a problem for the SIRIUS predictions? I suppose so, right?

You are performing this step without the identification based on AccurateMassSearch, or?

Leupeptin and pentamycin are relatively big and complex, it can very well be that SIRIUS has some issues with it. Did you check the different candidates? In most cases, the correct prediction is somewhere in the first 10 ranks.

When I feed SIRIUS the featureXML file and mzml file from Germicidins, it misses the right formula and fragmentation trees. How is that possible, if the formula is already predicted in the featureXML that is generated from AccurateMassSearchEngine?

How high is your mass error? It should be around or below 10 ppm. In some cases, SIRIUS is not able to perform the annotation based on the given sumformula, even if you think it is correct, due to a high mass error. What you can do is to export the generated .ms file and load it into the SIRIUS-GUI application (https://bio.informatik.uni-jena.de/software/sirius/) to check what SIRIUS is doing in this specific case.

We will upgrade to SIRIUS 4.6.0 soon, this might also improve the workflow.

Would you be interested in sharing your example data and tests with me? I am quite interested in how well SIRIUS handles your data.

If you have specific questions about SIRIUS in general it might also be worthwhile to talk to the developers from the University of Jena.

eeko-kon commented 3 years ago

I am performing the SIRIUS step from SIRIUS-GUI app right now, since I haven't implemented it yet in my workflow. I am first using TOPP and feeding the SIRIUS Adapter with two input files: one is the mzml converted file and one is the feature XML one generated from the AccurateMassSearchEngine.

The data is generated from an HRMS Orbitrap ID-X.

I would be very interested in sharing the files and my workflow with you. Would you perhaps have some time for a short meeting to show you how I am generating the files and the steps I am making so far?

oliveralka commented 3 years ago

Yes, a meeting would not be a problem, can you shoot me an email?

oliveralka commented 3 years ago

I will close this issue, for now. Please either reopen or shoot me another eMail if something is unclear.