Closed sdrogers closed 4 years ago
For either of these, it looks straightforward: we need a new peak_sampler object. Or the ability in a peak sampler object to return spectra like the above. Note: I think the peak sampler object is just used to initialise Chemicals? Is that correct? In which case, I'm not sure why it has (un-implemented) noise methods? Noise is an artefact of the data sampling process.
The peak sampler (misleading name?) is actually what the paper calls the 'database'. It stores the trained KDEs, all the scan data extracted from one or several mzML files, and also the scan duration information, i.e. the (1,1), (1,2), (2,1), and (2,2) information. We've already added a method to return ms2 spectra in the peak sampler. This is used by the ChemicalCreator
object used to initialise chemicals. ChemicalCreator has two modes:
In the current skeleton code, that unimplemented noise method is to be called by the Mass Spec when generating scans from chemicals. I forgot why it's there. Maybe the noise should be added as part of the ChemicalCreator instead? @vinnydavies
So, we need new methods to be able to get spectra from .mgf or .mzml files (ms2 spectra that is).
Dr Simon Rogers Senior lecturer, School of Computing Science, University of Glasgow
On 9 Jul 2020, at 16:32, Joe Wandy notifications@github.com wrote:
For either of these, it looks straightforward: we need a new peak_sampler object. Or the ability in a peak sampler object to return spectra like the above. Note: I think the peak sampler object is just used to initialise Chemicals? Is that correct? In which case, I'm not sure why it has (un-implemented) noise methods? Noise is an artefact of the data sampling process.
The peak sampler (misleading name?) is actually what the paper calls the 'database'. It stores the trained KDEs, all the scan data extracted from one or several mzML files, and also the scan duration information, i.e. the (1,1), (1,2), (2,1), and (2,2) information. We've already added a method to return ms2 spectra in the peak sampler. This is used by the ChemicalCreator object used to initialise chemicals. It has two modes:
generate spectra following the CRP or generate spectra by sampling the spectra. In the current skeleton code, that unimplemented noise method is to be called by the Mass Spec when generating scans from chemicals. I forgot why it's there. Maybe the noise should be added as part of the ChemicalCreator instead? @vinnydavies
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.
By "generate ms2 spectra by sampling a random spectra from the data.", that means sample N individual fragments right? The new method would need to get all the frags that come from one real ms2 scan
The mass spec has some noise generation already for the intensity, we can add more functionality there
The new method would need to get all the frags that come from one real ms2 scan
Yeah this here already gets all the frags that come from one real ms2 scan.
@joewandy what about the final part of my comment? When we have a real molecule and real spectrum?
Continue in issue #102
At the moment the code can generate MS2 spectra from either:
To be able to run realistic experiments (e.g. DIA vs DDA) we need to be able to assign a Chemical a real MS2 spectrum. This could come from either
For either of these, it looks straightforward: we need a new peak_sampler object. Or the ability in a peak sampler object to return spectra like the above.
Note: I think the peak sampler object is just used to initialise Chemicals? Is that correct? In which case, I'm not sure why it has (un-implemented) noise methods? Noise is an artefact of the data sampling process.
There is an additional use case (for DIA vs DDA experiments):
We have an mzML containing a known mixture. I.e. we know the molecules that are in there (there will also be noise) We have an mgf holding the spectra of these known chemical When we seed the simulator with this data, we want to be able to assign the correct MS2 spectrum to the correct Chemical (and perhaps noisy ones to the other chromatograms)
Maybe this is best done using the known chemicals? We have a DB of known chemicals, and we have a DB of their known spectra. In that setting, can we add "noisy" chemicals? I.e. random extra
UnknownChemical
objects to test the acquisition / analysis more?