frallain / pymsfilereader

Thermo MSFileReader Python bindings
MIT License
67 stars 25 forks source link

GetSummedMassSpectrum function doesn't show the same isotopic distribution as in Xcalibur #3

Open Golgh opened 6 years ago

Golgh commented 6 years ago

Thanks for offering this. The spectra I obtain after applying the GetSummedMassSpectrum function don't show the isotopic distribution, which I can see using Xcalibur. I have set the centroidResult as False. Is there anything else I can do?

PaulNobrega commented 5 years ago

I checked the variable types being passed to the C-code and everything seems to work as it is intended to. I guess the summing function is not doing what I or @Golgh had expected it to. Seems like this is user error, not an issue with the python bindings.

Below is a working code example to accomplish the intended goal (if there's an easier way, please update the thread).

Code :

import MSFileReader as qe
import numpy as np
import matplotlib.pyplot as plt

### Custom Functions ###

def get_XIC(filename, RTstart, RTend, MZstart, MZend):
    rawfile = qe.ThermoRawfile(filename)
    xic_x, xic_y = rawfile.GetChroData(startTime=RTstart,
                                       endTime=RTend,
                                       massRange1="{}-{}".format(MZstart, MZend),
                                       scanFilter="Full ms ")[0]
    rawfile.Close()
    return np.array(list(zip(xic_x, xic_y)))

def get_MS1(filename, RTstart, RTend, MZstart, MZend):
    rawfile = qe.ThermoRawfile(filename)
    scanStart = rawfile.ScanNumFromRT(RTstart)
    scanEnd = rawfile.ScanNumFromRT(RTend)

    ms_stack = np.array([])
    for i in range(scanStart, scanEnd):
        if 'ms2' not in rawfile.GetScanEventForScanNum(i).lower() and rawfile.IsProfileScanForScanNum(i):
            ml_x, ml_y = rawfile.GetMassListFromScanNum(i)[0]
            masslist = np.array(list(zip(ml_x, ml_y)))
            masslist = masslist[(MZend > masslist[:, 0]) & (masslist[:, 0] > MZstart)]
            if len(masslist) > 1:
                if len(ms_stack) == 0:
                    ms_stack = masslist
                else:
                    ms_stack = np.concatenate((ms_stack, masslist), axis=0)
    rawfile.Close()
    return ms_stack[np.lexsort((ms_stack[:, 1], ms_stack[:, 0]))]

def sum_MS1(ms1_array, sig_dec=2):
    summed_list = []
    for i in range(len(ms1_array)):
        ms1_array[i][0] = round(ms1_array[i][0], sig_dec)
        if i > 0:
            if ms1_array[i][0] == summed_list[-1][0]:
                summed_list[-1][1] += ms1_array[i][1]
            else:
                summed_list.append([ms1_array[i][0], ms1_array[i][1]])
        else:
            summed_list.append([ms1_array[i][0], ms1_array[i][1]])
    return np.array(summed_list)

### End Custom Functions ###

### Start Script ###

# define variables
filename = r"M:\Data\qe_data\example.raw"
RTstart = 45
RTend = 47
MZstart = 796
MZend = 798.5

# call custom functions
xic = get_XIC(filename, RTstart, RTend, MZstart, MZend)
ms1 = get_MS1(filename, RTstart, RTend, MZstart, MZend)
sum_ms = sum_MS1(ms1, 2)

# plot output
plt.plot(*zip(*xic))
plt.show()

plt.plot(*zip(*ms1))
plt.show()

plt.plot(*zip(*sum_ms))
plt.show()

### End Script ###

Output :

XIC

xic

MS1

ms1

Sum_MS1

sum_ms

frallain commented 5 years ago

@PaulNobrega Thanks for your contribution, hopefully Thermo will correct this bug in the future.