glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

FixedScanController bug #225

Closed joewandy closed 3 years ago

joewandy commented 3 years ago

Problem is observed when writing out the resulting mzML in the DsDA fixed scan controller notebook. We get referential integrity warnings.

  1. Some prescheduled scans seem to be dropped. This is because in the list of outgoing tasks, MS1 scan params that have the same params are considered to be the same, and thus are filtered by mistake.
  2. Even after fixing (1) there's still referential integrity warnings. To check further

To reproduce, check out the DsDA test notebook (inside DsDAexample_test) in OneDrive, and also https://github.com/glasgowcompbio/vimms/blob/master/vimms/scripts/debug_mzML_writer.py

joewandy commented 3 years ago

Wanted to push to my own branch, but accidentally pushed to master ....

The commit for this issue is https://github.com/glasgowcompbio/vimms/commit/e9dd99e669c3aa3c70ff739d1d90fa5a962a30d8. It contain the fixes for the two things below:

1. Prescheduled MS1 scans were dropped by the mass spec.

This bug happens because in the FixedScanControllers when we schedule a bunch of MS1 scan parameters at once, the parameters are all the same, so the scan params object are all seen to be the same and the duplicates are removed in the mass spec queue.

To fix this, I removed this __eq__ comparison inside ScanParameter. It was used to check that two objects are the same if their params dict are the same.

   def __eq__(self, other):
        if not isinstance(other, ScanParameters):
            return NotImplemented
        return self.params == other.params

The above code was previously used to determine if an incoming scan is to be processed by comparing its ScanParameter to the one sent out. This is no longer necessary since we're now explicitly tracking which scan to process inside the controller.

Have run unit tests to make sure that removing this eq doesn't change the resulting mzML files, so I think this should be okay.

2. Referential integrity when writing mzML from FixedScanController using DsDA schedule

This happens because when creating MS2 scans params from the DsDA schedule, the precursor scan id (MS1) is not set correctly in dsda_get_scan_params. In fact, it seems that the precursor scan id was set to the current MS2 scan id, which is not correct.

    for i in range(schedule.shape[0]):
        if types[i] != 'msms':
            scan_list.append(get_default_scan_params())
        else:
            if np.isnan(masses[i]):
                mz = 100
            else:
                mz = masses[i]
            scan_list.append(get_dda_scan_param(mz, 0.0, INITIAL_SCAN_ID + i, isolation_width, mz_tol, rt_tol))

The code is also a bit messy since we need to set the precursor scan id, but we can't determine what are the scan ids of MS1 and MS2 scans being sent. This has now been improved:

    for i in range(schedule.shape[0]):
        if types[i] != 'msms':
            precursor_scan_id = scan_id
            scan_params = get_default_scan_params(scan_id=precursor_scan_id)
        else:
            assert precursor_scan_id is not None
            mz = 100 if np.isnan(masses[i]) else masses[i]
            scan_params = get_dda_scan_param(mz, 0.0, precursor_scan_id, isolation_width, mz_tol, rt_tol,
                                             scan_id=scan_id)
        scan_list.append(scan_params)
        scan_id += 1

We can now pass a scan_id parameter to the get_default_scan_params and get_dda_scan_param methods. This determines what should be the scan ids of MS1 scans, and MS2 scans and their precursors. If this parameter is left to None, then it will be determined by the mass spec as before: starts from INITIAL_SCAN_ID, and each time incrementing by 1.

This means if you manually set the scan id of a scan to, say 100010, but don't set the scan id of the next one, then it will be automatically set to 100011 in the mass spec. But I think for consistency, if you manually set the ids for some scans in your controller, you should probably set for all too.