Added two new features calc_statistics and check_peaks_overlap
calc_statistics
A function that calculates statistics regarding m/z and intensity. By default, statistics of the number of peaks for all spectra are calculated (based on the analysis of the imzML file):
Also, this function accepts two optional parameters:
n_spectrum - the number of randomly selected spectrum for which the analysis is performed. The default value is 100.
full - analysis on all spectrum. For large datasets, it can take a long time. The default value is False
If one of these parameters is activated, the function returns additional data. Part of the values is per spectrum, part per dataset:
{
'mz_min': 50.0002, // minimum `m/z` value among all spectra
'mz_mzn': 1199.976, // maximum `m/z` value among all spectra
'mzs_min': [50.00475, 50.003, ... 50.0045], // minimum `m/z` values for each spectrum selected for analysis
'mzs_max': [1199.922, 1199.928, ... 1199.91], // maximum `m/z` values for each spectrum selected for analysis
'mzs_digitized': (53, 10895), (52, 10861), ... (1987, 1), // pairs of `m/z` values (integer value of Da) and the number of peaks that are in the range (m/z±0.5Da), in total among all analyzed spectra
'ints_min': [4.9351587, 3.075072, ... 3.7348557], // minimum intensity values for each spectrum selected for analysis
'ints_50p': [19.123741, 11.147136, ... 13.071995], // 50 percentile intensity value for each spectrum selected for analysis
'ints_95p': [54.903645, 36.51648, ... 39.682842], // 95 percentile intensity value for each spectrum selected for analysis
'ints_max': [7110.9473, 8336.904, ... 8003.7954], // minimum intensity values for each spectrum selected for analysis
'ints_total': [1073460, 700179.3, ... 779535.3 ], // total intensity value for each spectrum selected for analysis
'nonzero_intensity_lengths': [40390, 39074, ... 39162], // number of peaks that have non-zero intensity for each spectrum selected for analysis
'nonzero_peaks_percentage': 89.34, // percentage of peaks that have a non-zero value among all analyzed spectra
}
check_peaks_overlap
This function represents an approach for finding non-centroided datasets based on comparing the distance to the neighboring peak and shifting the existing peak by N ppm. The algorithm is described in the "Exclusion of non-centroided datasets" section of the article METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning.
The percentage of peaks that have overlap is returned.
Steps
file_path = '/home/ubuntu/dataset_01.imzML'
parser = ImzMLParser(file_path)
dataset_statistics = parser.calc_statistics() # base statistics
dataset_statistics = parser.calc_statistics(n_spectrum=500) # calculation of statistics based on 500 spectra
dataset_statistics = parser.calc_statistics(full=True) # all spectra are used to calculate statistics
ppm = 3.0 # or other value, depends of dataset
overlap = parser.check_peaks_overlap(ppm=ppm, n_spectrum=500)
Added two new features
calc_statistics
andcheck_peaks_overlap
calc_statistics
A function that calculates statistics regarding
m/z
andintensity
. By default, statistics of the number of peaks for all spectra are calculated (based on the analysis of the imzML file):Also, this function accepts two optional parameters:
n_spectrum
- the number of randomly selected spectrum for which the analysis is performed. The default value is 100.full
- analysis on all spectrum. For large datasets, it can take a long time. The default value isFalse
If one of these parameters is activated, the function returns additional data. Part of the values is per spectrum, part per dataset:
check_peaks_overlap
This function represents an approach for finding non-centroided datasets based on comparing the distance to the neighboring peak and shifting the existing peak by N ppm. The algorithm is described in the "Exclusion of non-centroided datasets" section of the article METASPACE-ML: Metabolite annotation for imaging mass spectrometry using machine learning. The percentage of peaks that have overlap is returned.
Steps