PlantProteomes / SpectrumReader

Apache License 2.0
1 stars 1 forks source link

other ideas for identifying peaks #2

Open edeutsch opened 2 years ago

edeutsch commented 2 years ago

Some peaks may be the first isotope an already identified peak. It would be 1.003355 greater than a previous peak (e.g. 111.0743 is the first isotope of 110.0713 from H a

edeutsch commented 2 years ago

Here also is a list of possible deltas to the "a" series:

        self.aa_immonium_losses = {
            'G': [],
            'A': [],
            'S': [],
            'P': [],
            'V': [ '-CH2-NH3', '-NH3', '+CO-NH3-CH2' ],
            'T': [ '+CO-NH3'],
            'C': [],
            'L': [ '-C3H6', '-CH2' ],
            'I': [ '-C3H6', '-CH2' ],
            'N': [ '-NH3' ],
            'D': [ '-H2O' ],
            'Q': [ '-CO-NH3', '-NH3', '+CO'],
            'K': [ '+CO-NH3', '-NH3', '+CO', '-C2H4-NH3', '+CO+H2ON2', '-NH3', '-C4H7N', '+CO+CO-C2H3N3', '+CO+H2O'],
            'E': [],
            'M': [ '-C2H2-NH3'],
            'H': [ '-CH2N', '+CO-NH2', '+CO-NH3', '+CO-NH', '+CO+H2O' ],
            'F': [ '-CH3N'],
            'R': [ '-C3H6N2', '-CH5N3', '-CH6N2', '-C2H4N2', '-CH2N2', '-CH3N', '-NH3', '-C4H7N', '+H2O+H2O-N3H7', '+CO+H2O' ],
            'Y': [ '-CO-NH3', '-CH3N' ],
            'W': [ '+CO', '-C4H6N2', '-C2H4N', '-CH3N', '-CHN', '+CO-NH3', '-NH3'],
        }

Maybe you can just use this code and use it to test these possibilities

edeutsch commented 2 years ago

I don't see any ids in your list with two possible explanations but I wonder if you have taken this into account? Especially as we add more to the list, there may be two different explanations with exactly the same mass and we should list both. Also it's possible that there are two different explanations with very close masses such that they both fall into the match tolerance

edeutsch commented 1 year ago

The formula for calculating a delta in ppm (parts per million) is just delta / mz * 1e6

So at 100 m/z, a delta of 0.0006 m/z is exactly 6 ppm.

But at 200 m/z, a delta of 6 ppm is 0.0012 m/z

nathanhzh commented 1 year ago

I just checked in my code and the data files! An interesting thing I noticed- when I changed the lower boundary to 40, it identified a new peak (the first time showing up and being identified) - b-V at 100.0757. Previously, the lowest peak was at 101.0709. There are probably more peaks and identified peaks that showed up after changing the boundary I left the lower boundary at 30, afterward, there were a lot of peaks that weren't clearly defined, although even 30, there are some peaks that aren't clearly defined

edeutsch commented 1 year ago

super, thanks, looking good!

yes, the lower you set your threshold the more peak will show up. Fine to leave at 30.

There's one puzzling thing about the plots, though. For IR-NH3: image the centroid seems to be quite some ways off from the predicted value visually. But looking at the printed numbers, it is only 0.00004 different! From the plot it looks 5 times worse than it is. it looks like 0.0002 off (each step is 0.0001). If the delta is 0.00004, the difference between the line and model centroid should only be half a step.

and here's another that is also 0.00004 different, yet it visually looks very close! image

There seems to be something wrong with the display of IR-NH3. would you check?

edeutsch commented 1 year ago

Maybe it has to do somehow with how the peak was originally triggered. I notice that there's also a problem with b-H: image First, it appears twice, presumably because it gets triggered twice. But the fit results are the same. and dashed line is always at 0. AH, maybe the dashed line is in mz units (0.0008) while the plots are now in ppm units. That's probably it.

So three things: 1) Fix the units in which the dashed line it plotted 2) Maybe recenter after a first fit and fit again for a final time 3) don't let a shoulder trigger have a peak appear twice like is happening for b-H

edeutsch commented 1 year ago

It occurs to me that there are two more types of ion that we should add to the list of things to look and annotate: water and ammonia losses: for every a, b, and y ion, and also the Immonium ions, subtract an H2O or an NH3 or both. Some of the immoniums already have those losses via aa_immonium_losses{}, but there may be some additional ones not already encoded there.

Hopefully that will identify a substantial additional number of peaks.