eeko-kon / pyOpenMS_UmetaFlow

Apache License 2.0
2 stars 1 forks source link

SiriusMSFile #2

Closed oliveralka closed 3 years ago

oliveralka commented 3 years ago

https://github.com/eeko-kon/py4e/blob/master/Workflownew.py#L56

This comes from the SiriusMSFile Class, since you would like to store a .ms file (internally - in memory). https://github.com/OpenMS/OpenMS/blob/develop/src/pyOpenMS/pxds/SiriusMSFile.pxd

python:

Cython signature: void store(MSExperiment & spectra, String & msfile,
FeatureMapping_FeatureToMs2Indices & feature_ms2_spectra_map, bool & feature_only, 
int & isotope_pattern_iterations, bool no_mt_info, 
libcpp_vector[SiriusMSFile_CompoundInfo] v_cmpinfo)

C++:

// write msfile and store the compound information in CompoundInfo Object
vector<SiriusMSFile::CompoundInfo> v_cmpinfo;
bool feature_only = (sirius_algo.getFeatureOnly() == "true") ? true : false;
bool no_mt_info = (sirius_algo.getNoMasstraceInfoIsotopePattern() == "true") ? true : false;
int isotope_pattern_iterations = sirius_algo.getIsotopePatternIterations();
SiriusMSFile::store(spectra,
                        sirius_tmp.getTmpMsFile(),
                        feature_mapping,
                        feature_only,
                        isotope_pattern_iterations,
                        no_mt_info,
                        v_cmpinfo);

In general, you can check the parameter also in the documentation if you do not know what it is doing and why? https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/nightly/html/UTILS_SiriusAdapter.html

// here you instantiate a CompoundInfo object, which is used to store additional metadata, which is 
// not parseable after the SIRIUS call anymore.
vector<SiriusMSFile::CompoundInfo> v_cmpinfo;

// this is a parameter, which is called "feature_only" 
// It is a boolean value (true/false) and if it is true you are using the  the feature information 
// from in_featureinfo to reduce the search space to MS2 associated with a feature.
// this is recommended when working with featureXML input, if you do NOT use it 
// sirius will use every individual MS2 spectrum for estimation (and it will take ages)
bool feature_only = (sirius_algo.getFeatureOnly() == "true") ? true : false;

// This boolean value can lead to discarding the masstrace information from a feature and will usethe isotope_pattern_iterations instead -> so in your case it should be false - since you would like to use to feature information.
bool no_mt_info = (sirius_algo.getNoMasstraceInfoIsotopePattern() == "true") ? true : false;

// this will get the standard parameter value of isotope_pattern_iterations, which means that if a feature does not have any information available it will try to look for an isotope pattern in C13 distance with a max iterations of 3. Be careful here, if you use to many iteration you probably will pick up some noise later on. 
int isotope_pattern_iterations = sirius_algo.getIsotopePatternIterations();
oliveralka commented 3 years ago

Ok, which one have you used for the test above?

oliveralka commented 3 years ago

native_id: scanId=1106230 accession: MS:1001508 Could not extract scan number - no valid native_id_type_accession was provided

The problem here is that it can not get the scan_number, since this case is not in our datastructure, that why it probably segfaults at some point: https://github.com/OpenMS/OpenMS/blob/55598cf39c73a2092365fbdb51e63defa9b29c6f/src/openms/source/METADATA/SpectrumLookup.cpp#L255 Is is not able to extract the correct number from the nativeID.

Can you check what segfault you get with "GermicidinAstandard.mzML", because I think these are actually two different ones!

eeko-kon commented 3 years ago

Alright, so the most recent files (Standards/GermicidinAStandard10e-2.mzML) and the ones I've been using since yesterday are from an Agilent MassHunter QTOF. I re-ran the Germicidin standards to ensure better fragmentation. The old files were from Bruker. Sorry for the confusion. I ll re run it with the old data (bruker)

oliveralka commented 3 years ago

Can you send me the new files as well.

Edit: With the current version it will not work with your files, since the handling of native_id: scanId=1106230 accession: MS:1001508 is missing.

eeko-kon commented 3 years ago

Yes, I am adding them now to the drive (both raw and mzML) at the folder Agilent files: https://drive.google.com/drive/folders/1O0JmZa17oqyzObAjphbXxyHmE9LF6Tkf?usp=sharing

I ran the old file (Bruker) and it's the same: Segmentation fault (core dumped)

oliveralka commented 3 years ago

Ok, both files work in KNIME with SIRIUS and you get an output for both files?

eeko-kon commented 3 years ago

Let me try again to make sure.

oliveralka commented 3 years ago

Should we schedule another call for Monday evening? Then we can go over it again and I can tell you what I think is wrong and hopefully, I got to debug it till then. When would be best for you? I am available from 14:00 o'clock CEST.

eeko-kon commented 3 years ago

Sounds great, let's do that. Then I ll have all the data I need to answer your questions. Thanks a lot! I ll send you an invitation for 15:00 CEST?

oliveralka commented 3 years ago

15:00 CEST sounds good - have a nice weekend and see you on Monday!

eeko-kon commented 3 years ago

You too and thanks a lot!

eeko-kon commented 3 years ago

It's loading now! But still seg fault 11. So at least the first error is gone!

oliveralka commented 3 years ago

OK, I will upload an updated version for the PR in few minutes, could you test that?

edit: Should be ready now: https://github.com/eeko-kon/py4e/pull/4

eeko-kon commented 3 years ago

So far, all conversions end up in a seg fault.

The files from Thermo that are only converted to centroid (mzml) cannot be loaded to pyopenms (because of the negative intensities)

The files generated from Thermo, converted to centroid and filtered (no negative intensities) work fine in ToppView, but in KNIME, SIRIUS cannot process them. Seg fault occurs again in pyopenms.

oliveralka commented 3 years ago

I am really not sure what is up with your files. What about the ones you used FileFilter to remove the neg. intensities?

Can you please give me one file, which works in KNIME and you get and Sirius output.? Can you send me the output and KNIME workflow (.knwf) in addition.

I can use that for testing the python wrapper because I have to be sure that it works on the C++ side.

eeko-kon commented 3 years ago

Of course. I've uploaded all files in the drive link(see read me section of this repository). Unfortunately, the filefiltered ones do not work in Sirius via knime. I will send you the knime workflow and a list with what files work and what do not work via knime.

I'm just trying to exhaust every possibility that this is a file issue. Sorry for the confusion.

oliveralka commented 3 years ago

OK, maybe you can make a short summary, which files work and which do not and how you processed the files: Maybe add the error if there is a specific one, or the position where it fails in KNIME e.g. File - Instrument - processing - KNIME - Error blabla.mzML - Agilent - peakpicking ms1,2 (msconvert) - Yes blabla2.mzML - Thermo - peakpicking ms1 (msconvert); filter negatives (filefilter) - No - SiriusAdapter Error: blablal

oliveralka commented 3 years ago

Just FYI: The new SiriusAdapter working with SIRIUS 4.6.0 was merged a few day ago. So the next pyopenms release might be a little bit different. I can give you a current pyopenms build if you like.

eeko-kon commented 3 years ago

Sounds great. Thank you! Yes, I will make a detailed file with the results and errors and send it to you in a couple of hours.

oliveralka commented 3 years ago

I finally found and fixes the issue concerning the SiriusMSFile::store function. I will send you a ".whl" with the fixes pyopenms version by the end of the day.

The problem was that the BaseFeature* tried to access memory, which was altered by copying the FeatureMap internally in cython to read and write a python list (preprocessing).

def preprocessingSirius(self,  featureinfo , MSExperiment spectra , list v_fp , KDTreeFeatureMaps fp_map_kd , FeatureMapping_FeatureToMs2Indices feature_mapping ):
    """Cython signature: void preprocessingSirius(const String & featureinfo, MSExperiment & spectra, libcpp_vector[FeatureMap] & v_fp, KDTreeFeatureMaps & fp_map_kd, FeatureMapping_FeatureToMs2Indices & feature_mapping)"""
    assert (isinstance(featureinfo, str) or isinstance(featureinfo, unicode) or isinstance(featureinfo, bytes) or isinstance(featureinfo, String)), 'arg featureinfo wrong type'
    assert isinstance(spectra, MSExperiment), 'arg spectra wrong type'
    assert isinstance(v_fp, list) and all(isinstance(elemt_rec, FeatureMap) for elemt_rec in v_fp), 'arg v_fp wrong type'
    assert isinstance(fp_map_kd, KDTreeFeatureMaps), 'arg fp_map_kd wrong type'
    assert isinstance(feature_mapping, FeatureMapping_FeatureToMs2Indices), 'arg feature_mapping wrong type'

    cdef libcpp_vector[_FeatureMap] * v2 = new libcpp_vector[_FeatureMap]()
    cdef FeatureMap item2
    for item2 in v_fp:
        v2.push_back(deref(item2.inst.get()))

    self.inst.get().preprocessingSirius(deref((convString(featureinfo)).get()), (deref(spectra.inst.get())), deref(v2), (deref(fp_map_kd.inst.get())), (deref(feature_mapping.inst.get())))
    cdef libcpp_vector[_FeatureMap].iterator it_v_fp = v2.begin()
    replace_0 = []
    while it_v_fp != v2.end():
        item2 = FeatureMap.__new__(FeatureMap)
        item2.inst = shared_ptr[_FeatureMap](new _FeatureMap(deref(it_v_fp)))
        replace_0.append(item2)
        inc(it_v_fp)
    v_fp[:] = replace_0
    del v2

This was fixed by adding an additional class to wrap the vector and the KDTreeMaps.

eeko-kon commented 3 years ago

Thank you so much! I will look into everything later tonight!

Best Regards/ Venlig hilsen

Eftychia Eva Kontou

Ph.D. in Metabolomics of Specialized Metabolites

DTU Biosustain

Technical University of Denmark

Novo Nordisk Foundation Center for Biosustainability

Kemitorvet

Building 220, Room 328D

2800 Kgs.Lyngby

Mobile +45 91943555


Από: Oliver Alka @.***> Στάλθηκε: Τετάρτη, 24 Μαρτίου 2021 1:59:40 μμ Προς: eeko-kon/py4e Κοιν.: Eftychia Eva Kontou; Comment Θέμα: Re: [eeko-kon/py4e] SiriusMSFile (#2)

I finally found and fixes the issue concerning the SiriusMSFile::store function. I will send you a ".whl" with the fixes pyopenms version by the end of the day.

I will add a problem description later.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/eeko-kon/py4e/issues/2#issuecomment-805799221, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANAAKD6DVMD4TVJ7HMRMNALTFHO3ZANCNFSM4YGBDPKA.