compomics / moFF

A modest Feature Finder (moFF) to extract MS1 intensities from Thermo raw file
Apache License 2.0
33 stars 11 forks source link

Detail on what pyMZML_xic_out does #27

Closed antortjim closed 6 years ago

antortjim commented 6 years ago

Hi there

I am carefully looking at how moFF works and I have stumbled upon a piece of code I cannot wrap my head around.

timeDependentIntensities = []
minpos = bisect.bisect_left(rt_list, minRT)
maxpos = bisect.bisect_left(rt_list, maxRT)

for specpos in range(minpos,maxpos):
        specid = runid_list[specpos]
        spectrum = run[specid]
        if spectrum['scan start time'] > maxRT:
            break
        if spectrum['scan start time'] > minRT and spectrum['scan start time'] < maxRT:
            #print 'in ', specid
            lower_index = bisect.bisect(spectrum.peaks, (float(MZValue - ppmPrecision * MZValue), None))
            upper_index = bisect.bisect(spectrum.peaks, (float(MZValue + ppmPrecision * MZValue), None))
            maxI = 0.0
            for sp in spectrum.peaks[lower_index: upper_index]:
                if sp[1] > maxI:
                    maxI = sp[1]
            if maxI > 0:
                timeDependentIntensities.append([spectrum['scan start time'], maxI])

This for loop iterates over all MS1 spectra (as extracted by scan_mzml() in rt_list) that have a retention time within minRT and maxRT*. bisect.bisect reports the position in the spectrum.peaks list where a peak with a given MZValue would fit while keeping the list sorted (increasing MZ values). Then, I understand that the code goes through all the peaks within lower_index and upper_index and fetches the highest intensity available in that range.

However, I am not sure about what bisect.bisect is doing nor what spectrum.peaks is. Is spectrum.peaks a list of MS1 or MS2 peaks? If they are MS1 peaks, I thought a spectrum could only have 1 MS1 peak (that of the precursor ion). I don't understand how could there be several MS1 peaks for the same spectrum.

I hope you could please shed some light onto this.

Thanks for your time and your input!

Best

Antonio

Maux82 commented 6 years ago

Hi Antonio, Sorry for my late reply. It has been a while since I did this code for the mzML file. If I remebar corretcly , spectrum.peaks contains all ms1 peaks for all the scan line so you have to iterate through them to get all the MZ_values in you specific RT times.

Just a side note, this code is based on the older version of pymzml and it does not work on the new version of pymzML that works only on python 3.0. Actually I am little too busy , but in my plains there is also the moFF porting to python 3 and use the new version of pymzML. If you want and you have time it will be great if you could help me with that. Since I am working on new version of moFF and I am planning to go out with new manuscript soon I will reward your help adding you as co-author.

Cheers Andrea

antortjim commented 6 years ago

Hi Andrea

Thanks, no problem! Understood!!

I would love to do that!!! Right now I am doing my Master Thesis due in August 6th. After that I can help for sure. Would that be ok? Thanks for the offer :smile:

Cheers

Antonio