glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

Restructure MS noise and data generation #2

Closed sdrogers closed 4 years ago

sdrogers commented 4 years ago

In the use case of creating a top-N file from an existing mzML (full scan), we don't necessarily need a pickle of MS2 and timing info although we have to provide one. Can we make the system cope without one if not provided? Timings would then have to be provided, and MS2 could be generated using other methods, or left empty if, say, the user is only interested in coverage.

sdrogers commented 4 years ago

Part 1: Noise

  1. Create new noise generation class that is separate from the stuff in DataGenerator.py
  2. Modify MS class to just use this
  3. This decouples the noise aspect from the generation of the chemical list

Part 2: Chemical Generation

Let's say there are 500 chems from HDMB. We want to generate them for simulation, and assign each chemical a unique spectra from mgf files. How do we do that? Right now it doesn't seem straightforward at all.

  1. Sanitise chemical generation
    • Where to get the mass, rt, chromatograms, ms1, ms2 for the chemical?
    • A set of chemicals can be drawn from a well of the autosampler.
      • Need to capture this notion.
      • Could use a generator to generate multiple wells?
sdrogers commented 4 years ago

@joewandy As part of the refactoring, this if should disappear: https://github.com/sdrogers/vimms/blob/3167584fc13d6338f6c8d7f7608c4c62f70a9540/vimms/MassSpec.py#L520-L523 a single method call here leaving the noise choice to whatever noise class (your new class) the user provides

sdrogers commented 4 years ago

@joewandy Can we close this now?

joewandy commented 4 years ago

No, have not done the part to improve chemical generation yet.

joewandy commented 4 years ago

Done by simon in issue #98