chasemc / IDBacApp

A MALDI Mass Spectrometry Bioinformatics Platform
https://chasemc.github.io/IDBac
Other
29 stars 7 forks source link

Samples removed despite peaks present in protein run #132

Closed grimcynthia closed 5 years ago

grimcynthia commented 5 years ago

Hi Chase,

IDBac removed the sample below, despite there being peaks? I ran the AutoFlex method a couple of times due to some MALDI issues (laser power resetting, needed to zoom out the spectra), but there are other samples that I also had to do this for in the dataset and they seemed to process ok? I can send files if needed :) The spectra set below has the three replicates for one sample for both the first run and the second.

EDIT: By the way, this is also included in the set I emailed to you, if you want to take a closer look!

NMR1f-4-SNF-A1-j

Thanks, Cynthia

chasemc commented 5 years ago

Which files in that image are the ones causing issues?

It's likely that you have set the peak percent presence or SNR too high.

grimcynthia commented 5 years ago

Ok, fiddling with the SNR and percent presence just a pinch was enough (facepalm). Percent presence of 80% with SNR 3.5, or 60% and 4 both are enough flex and so I'll figure out what settings are best suited to my dataset. My concern was that the first run (or "flat") spectra were being included and "weighing down" the properly acquired spectra somehow.

For future reference, is what's happening here that A1/1, A2/1, A3/1 (the "flat" spectra from the first run) have no peaks detected, so they're not included as part of the total number of spectra when calculating percent presence. Then, A2/2 has a lower S/N ratio (ex. 3.75) so that spectra is also not selected for analysis when SNR is set to 4, but it does weigh into the total number for percent presence? Therefore setting presence to 60% (or 2/3rds of the included spectra) or S/N to 3.5 (so that now A2/2 is included) means that the sample on A1-3 is included in the dendrogram? This also might be an oversimplification of the # of spectra included considering you can save a sum of spectra...

I've drawn a picture. alternatively, is the denominator 3 because IDBac requires triplicates? explainingParams

chasemc commented 5 years ago

My concern was that the first run (or "flat") spectra were being included and "weighing down" the properly acquired spectra somehow. Yes they are. Maybe in the future IDBac will handle being able to exclude individual spectra (the sqlite files I have setup to allow this in the future), but for now you should only put data into IDBac that is "good" data.

I wasn't able to follow your example, but you can conceptualize "percent presence" as e.g.: Is peak 2001.1 m/z present in x replicates with peak / x replicates