Open xieyying opened 2 weeks ago
Here I post the link of mass data: https://drive.google.com/drive/folders/1CZBRSU_hdhq7UHXdqJYQSrMUVcSfarGy?usp=drive_link
Hi! Just to clarify, what is the exact output? You are saying that already the first step (i.e. MassTraceDetection) failed, right?
Do you think you could reduce the example/code to the first part that fails?
Sure, Here is the reduced code: import pyopenms as oms
class FeatureDetection: """ Extract features from mzML files using pyopenms """ def init(self, file): self.file = file self.exp = oms.MSExperiment() self.mass_traces = []
def load_file(self):
"""Load the mzML file"""
oms.MzMLFile().load(self.file, self.exp)
def get_dataframes(self):
"""Get the ion dataframes and filter them"""
self.ion_df = self.exp.get_ion_df()
print(self.ion_df)
def mass_trace_detection(self):
"""Detect the mass traces"""
mtd = oms.MassTraceDetection()
mtd_par = mtd.getDefaults()
# print(mtd_par)
mtd_par.setValue("mass_error_ppm", 20.0)
mtd_par.setValue('min_trace_length', 1.0)
mtd_par.setValue("noise_threshold_int", 0.0)
mtd_par.setValue("chrom_peak_snr", 0.0)
mtd_par.setValue('trace_termination_criterion', 'sample_rate')
mtd_par.setValue('min_sample_rate', 0.5)
mtd.setParameters(mtd_par)
mtd.run(self.exp, self.mass_traces, 0)
print(len(self.mass_traces))
for i in range(len(self.mass_traces)):
trace = self.mass_traces[i]
print(f"Trace {i}:")
print(f"Centroid MZ: {trace.getCentroidMZ()}")
print(f"Centroid RT: {trace.getCentroidRT()}")
print(f"Size: {trace.getSize()}")
print("--------------------")
def run(self):
"""Run the feature detection process"""
self.load_file()
self.get_dataframes()
self.mass_trace_detection()
And here is the results. The most intense ion at m/z 148.57 can be get at the step self.ion_df = self.exp.get_ion_df(), but the mass trace on 148.57 can not get at the step of mass_trace_detection.
@xieyying I sent a request to access your example data yesterday, could you please accept that? I will take a look.
The data can be directly downloaded without need to request. https://drive.google.com/drive/folders/11Nz3HQvyH0_yk3DuL4DoW81iQuAJ2JQV?usp=drive_link
For this data, I try the following parameter in the step of mass_trace_detection. mtd_par.setValue('trace_termination_criterion', 'sample_rate') mtd_par.setValue('min_sample_rate', 0.01) I can find the expected m/z, but not using other 'min_sample_rate‘(0.1-1) It seems the parameter of 'trace_termination_criterion' affects the result significantly. But, what real meaning of trace_termination_criterion? I can not get its real meaning from the instruction of these parameters. And in another analysis of simulated LC-MS spectrum generated from hundreds of formula, we also encountered this problems. Many features can not be found. And the parameter of 'trace_termination_criterion' seems to affect the result significantly. In this case, the most numbers of features were found using min_sample_rate', 0.1
@xieyying Which of these files did you use in your example?
trace_termination_criterion:
outlier: Uses 'trace_termination_outliers' and 'min_sample_rate'. Stops tracing from the apex when n='trace_termination_outliers' consecutive scans do not have a peak at an m/z that is in the range of the current weighted mean m/z. Then also filters traces where num_scans_with_peak_in_range / num_scans < min_sample_rate.
sample_rate: Uses 'min_sample_rate' only. Stops tracing and discards when num_scans_with_peak_in_range / num_scans < min_sample_rate.
In your small example, the problem might be that there are not enough scans to the left of the apex. Please try to run the example on a larger area around that feature/trace. (untested conjecture)
What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.
@xieyying Which of these files did you use in your example? I am so sorry, this is the correct link.
https://drive.google.com/drive/folders/1CZBRSU_hdhq7UHXdqJYQSrMUVcSfarGy?usp=drive_link
What happens if you keep "trace_termination_criterion" as "outlier" but reduce the number of outliers with "trace_termination_outliers" to zero? The algorithm is looking for peaks left and right which have outlier m/z values. As @jpfeuffer mentioned, the main intensity traces start at the very beginning of your peak map. No chance to find the default 5 outliers to the left.
Combination of "outlier" and "zero" still can not provide correct result. Only min_sample_rate as 0.01 was added can generate the expected result.
The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.
@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?
The issue is your peak map contains only data related to that simulated metabolite, but no noise at all (not realistic). Once you add some noise left and right it will work as expected.
Yes,I add some random noise and set a suitable noise threshold. The default values for the parameter of "trace_termination_outliers" worked well and I got my expected feature.
@axelwalter Ok. But can't this be disabled somehow with the correct snr and intensity settings?
Yes, when I set higher "noise_threshold_int", the situation goes back and the expected can not be found.
I was actually referring to the chrom_peak_snr setting. Noise_threshold_int is just a simple minimum intensity cutoff.
There must be a way to find this trace without noise, otherwise I would consider it a bug @axelwalter
So I generated my own test feature (Gaussian distribution) without noise and any peaks around it. Default settings work just fine. Not sure why that was not the case for @xieyying data. Perhaps because of the strong tailing.
Hmm but the MassTraceDetection alone does not care about intensities other than for the weighted mz of the trace and the order in which it seeds. Maybe if it is so tailed that it does not pick up enough peaks to the left side? It must be really few though. I think it tries at least 5 scans in each direction. And if the min_sample_frequency is 0.5 this means 3 peaks need to be found to each side.
But glad that it works in general.
Describe the problem you encountered I simulated a LC-MS data from a compound and analyze it using pyopenms through Untargeted Metabolomics Pre-Processing pipline. But I can not get the expected mass traces even by exhaustive tuning the parameters. I also view our data using TOPPView. and the expected mass trace all can be found. I also check iod_df generated by MzMLFile().load() and I also can find our expected mass.
To Reproduce Steps to reproduce the behavior:
What should be happening I should find the expected mass trace at m/z 148.5713 (the most intense), 149.0726, 149.5749, and 150.0755. But now, I just can find mass traces at m/z 149.5741 and 150.0755 by MassTraceDetection().
Screenshots![image](https://github.com/OpenMS/pyopenms-docs/assets/128657846/d810c3c3-0533-4e1c-b822-9d4b475c1227)
System information:
conda list
.conda list
.Additional context The .mzML and .py file is not supported to uploaded. and Here I just pasted our code: class FeatureDetection: """ Extract features from mzML files using pyopenms """ def init(self, file): self.file = file self.exp = oms.MSExperiment() self.mass_traces = [] self.mass_traces_deconvol = [] self.feature_map = oms.FeatureMap() self.chrom_out = []