OpenMS / pyopenms-docs

pyOpenMS readthedocs documentation, additional utilities, addons, scripts, and examples.
https://pyopenms.readthedocs.io
Other
42 stars 50 forks source link

Get wrong MassTraceDetection results #437

Open xieyying opened 2 weeks ago

xieyying commented 2 weeks ago

Describe the problem you encountered I simulated a LC-MS data from a compound and analyze it using pyopenms through Untargeted Metabolomics Pre-Processing pipline. But I can not get the expected mass traces even by exhaustive tuning the parameters. I also view our data using TOPPView. and the expected mass trace all can be found. I also check iod_df generated by MzMLFile().load() and I also can find our expected mass.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What should be happening I should find the expected mass trace at m/z 148.5713 (the most intense), 149.0726, 149.5749, and 150.0755. But now, I just can find mass traces at m/z 149.5741 and 150.0755 by MassTraceDetection().

Screenshots image

System information:

Additional context The .mzML and .py file is not supported to uploaded. and Here I just pasted our code: class FeatureDetection: """ Extract features from mzML files using pyopenms """ def init(self, file): self.file = file self.exp = oms.MSExperiment() self.mass_traces = [] self.mass_traces_deconvol = [] self.feature_map = oms.FeatureMap() self.chrom_out = []

def load_file(self):
    """Load the mzML file"""
    oms.MzMLFile().load(self.file, self.exp)

def get_dataframes(self):
    """Get the ion dataframes and filter them"""
    self.ion_df = self.exp.get_ion_df()
    self.ion_df = self.ion_df[self.ion_df['inty'] > 0.0]
    # self.ion_df.to_csv(r'C:\Users\xyy\Desktop\test\tem\ion_df.csv', index=False)

def mass_trace_detection(self):
    """Detect the mass traces"""
    mtd = oms.MassTraceDetection()
    mtd_par = mtd.getDefaults()
    # print(mtd_par)
    mtd_par.setValue("mass_error_ppm", 20.0)
    mtd_par.setValue('min_trace_length', 1.0)
    mtd_par.setValue("noise_threshold_int", 0.0)
    mtd_par.setValue("chrom_peak_snr", 0.0)
    mtd_par.setValue('trace_termination_criterion', 'sample_rate')
    mtd_par.setValue('min_sample_rate', 0.5)
    mtd.setParameters(mtd_par)
    mtd.run(self.exp, self.mass_traces, 0)
    print(len(self.mass_traces))
    for i in range(len(self.mass_traces)):
        trace = self.mass_traces[i]
        print(f"Trace {i}:")
        print(f"Centroid MZ: {trace.getCentroidMZ()}")
        print(f"Centroid RT: {trace.getCentroidRT()}")
        print(f"Size: {trace.getSize()}")
        print("--------------------")    

def elution_peak_detection(self):
    """Detect the elution peaks"""
    epd = oms.ElutionPeakDetection()
    epd_par = epd.getDefaults()
    epd_par.setValue("width_filtering", "fixed")
    # print(epd_par)
    epd.setParameters(epd_par)
    epd.detectPeaks(self.mass_traces, self.mass_traces_deconvol)
    print(len(self.mass_traces_deconvol))

def feature_detection(self):
    """Detect the features"""
    ffm = oms.FeatureFindingMetabo()
    ffm_par = ffm.getDefaults()
    ffm_par.setValue("local_rt_range", 10.0)
    ffm_par.setValue("local_mz_range", 6.5)
    ffm_par.setValue("chrom_fwhm", 5.0)
    ffm_par.setValue('enable_RT_filtering', 'true')
    ffm_par.setValue('mz_scoring_by_elements', 'true')
    ffm_par.setValue("elements", "CHNOPSClBrFeBSeIF")
    ffm_par.setValue('isotope_filtering_model','none')
    ffm_par.setValue("remove_single_traces", "true")
    ffm_par.setValue("report_convex_hulls", 'true')
    ffm.setParameters(ffm_par)
    ffm.run(self.mass_traces_deconvol, self.feature_map, self.chrom_out)
    self.feature_map.setUniqueIds()
    self.feature_map.setPrimaryMSRunPath([self.file.encode()])
    print(self.feature_map.get_df())
    print(len(self.feature_map.get_df()))

def run(self):
    """Run the feature detection process"""
    self.load_file()
    self.get_dataframes()
    self.mass_trace_detection()
    self.elution_peak_detection()
    self.feature_detection()
    return self.feature_map
xieyying commented 2 weeks ago

Here I post the link of mass data: https://drive.google.com/drive/folders/1CZBRSU_hdhq7UHXdqJYQSrMUVcSfarGy?usp=drive_link

jpfeuffer commented 2 weeks ago

Hi! Just to clarify, what is the exact output? You are saying that already the first step (i.e. MassTraceDetection) failed, right?

Do you think you could reduce the example/code to the first part that fails?

xieyying commented 2 weeks ago

Sure, Here is the reduced code: import pyopenms as oms

class FeatureDetection: """ Extract features from mzML files using pyopenms """ def init(self, file): self.file = file self.exp = oms.MSExperiment() self.mass_traces = []

def load_file(self):
    """Load the mzML file"""
    oms.MzMLFile().load(self.file, self.exp)

def get_dataframes(self):
    """Get the ion dataframes and filter them"""
    self.ion_df = self.exp.get_ion_df()
    print(self.ion_df)

def mass_trace_detection(self):
    """Detect the mass traces"""
    mtd = oms.MassTraceDetection()
    mtd_par = mtd.getDefaults()
    # print(mtd_par)
    mtd_par.setValue("mass_error_ppm", 20.0)
    mtd_par.setValue('min_trace_length', 1.0)
    mtd_par.setValue("noise_threshold_int", 0.0)
    mtd_par.setValue("chrom_peak_snr", 0.0)
    mtd_par.setValue('trace_termination_criterion', 'sample_rate')
    mtd_par.setValue('min_sample_rate', 0.5)        
    mtd.setParameters(mtd_par)
    mtd.run(self.exp, self.mass_traces, 0)
    print(len(self.mass_traces))
    for i in range(len(self.mass_traces)):
        trace = self.mass_traces[i]
        print(f"Trace {i}:")
        print(f"Centroid MZ: {trace.getCentroidMZ()}")
        print(f"Centroid RT: {trace.getCentroidRT()}")
        print(f"Size: {trace.getSize()}")
        print("--------------------")    

def run(self):
    """Run the feature detection process"""
    self.load_file()
    self.get_dataframes()
    self.mass_trace_detection()

And here is the results. The most intense ion at m/z 148.57 can be get at the step self.ion_df = self.exp.get_ion_df(), but the mass trace on 148.57 can not get at the step of mass_trace_detection. image

axelwalter commented 2 weeks ago

@xieyying I sent a request to access your example data yesterday, could you please accept that? I will take a look.

xieyying commented 2 weeks ago

The data can be directly downloaded without need to request. https://drive.google.com/drive/folders/11Nz3HQvyH0_yk3DuL4DoW81iQuAJ2JQV?usp=drive_link

For this data, I try the following parameter in the step of mass_trace_detection. mtd_par.setValue('trace_termination_criterion', 'sample_rate') mtd_par.setValue('min_sample_rate', 0.01) I can find the expected m/z, but not using other 'min_sample_rate‘(0.1-1) It seems the parameter of 'trace_termination_criterion' affects the result significantly. But, what real meaning of trace_termination_criterion? I can not get its real meaning from the instruction of these parameters. And in another analysis of simulated LC-MS spectrum generated from hundreds of formula, we also encountered this problems. Many features can not be found. And the parameter of 'trace_termination_criterion' seems to affect the result significantly. In this case, the most numbers of features were found using min_sample_rate', 0.1

axelwalter commented 2 weeks ago

@xieyying Which of these files did you use in your example?

jpfeuffer commented 2 weeks ago

trace_termination_criterion:

jpfeuffer commented 2 weeks ago

In your small example, the problem might be that there are not enough scans to the left of the apex. Please try to run the example on a larger area around that feature/trace. (untested conjecture)

axelwalter commented 2 weeks ago

What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.

xieyying commented 2 weeks ago

@xieyying Which of these files did you use in your example? I am so sorry, this is the correct link.

https://drive.google.com/drive/folders/1CZBRSU_hdhq7UHXdqJYQSrMUVcSfarGy?usp=drive_link

xieyying commented 2 weeks ago

What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.

Combination of "outlier" and "zero" still can not provide correct result. Only min_sample_rate as 0.01 was added can generate the expected result.

axelwalter commented 2 weeks ago

The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.

jpfeuffer commented 2 weeks ago

@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?

xieyying commented 2 weeks ago

The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.

Yes,I add some random noise and set a suitable noise threshold. The default values for the parameter of "trace_termination_outliers" worked well and I got my expected feature.

@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?

Yes, when I set higher "noise_threshold_int", the situation goes back and the expected can not be found.

jpfeuffer commented 2 weeks ago

I was actually referring to the chrom_peak_snr setting. Noise_threshold_int is just a simple minimum intensity cutoff.

There must be a way to find this trace without noise, otherwise I would consider it a bug @axelwalter

axelwalter commented 1 week ago

So I generated my own test feature (Gaussian distribution) without noise and any peaks around it. Default settings work just fine. Not sure why that was not the case for @xieyying data. Perhaps because of the strong tailing.

jpfeuffer commented 1 week ago

Hmm but the MassTraceDetection alone does not care about intensities other than for the weighted mz of the trace and the order in which it seeds. Maybe if it is so tailed that it does not pick up enough peaks to the left side? It must be really few though. I think it tries at least 5 scans in each direction. And if the min_sample_frequency is 0.5 this means 3 peaks need to be found to each side.

But glad that it works in general.