glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

Top-N and ROI controllers inside the 'shifted_controllers' seem broken #4

Closed joewandy closed 4 years ago

joewandy commented 4 years ago

When testing the shifted_controllers branch on the simulator, I don't see any MS2 scans being produced by the Top-N and ROI controllers.

On the master branch, the code works fine .. so something is broken somehow.

joewandy commented 4 years ago

Seems that MS1 scan wasn't being processed in the simulator (hence we got no MS2 scans). This was fixed in https://github.com/sdrogers/vimms/commit/9d96961d2f592ed86822b981eda87f65b18dc136, where we set the starting scan number to 100000 to be the same as IAPI.

joewandy commented 4 years ago

The SmartROI controller couldn't run because last_ms1_scan has been renamed to scan_to_process. Fixed in commit #https://github.com/sdrogers/vimms/commit/0302803d72ec0572de5255d9c1bfb66993bf6eac.

Also it doesn't produce any ms2 scans (because no ms1 scan was being processed). This is because these 2 lines that track the next scan id to process are not correct: https://github.com/sdrogers/vimms/blob/shifted_controllers/vimms/Controller/roi.py#L242-L243, but not sure how to fix them (just add 100000?). @vinnydavies could you help? Thanks.

sdrogers commented 4 years ago

Those two lines look a bit fraught with danger -- I guess it's working out what the number is of the ms1 scan. Would it be neater to update this at the point the MS1 scan is made? i.e:

sdrogers commented 4 years ago

Just to clarify the above. We give each scan a scan number. We should do that when we make them. And then the attribute next_scan_to_process should be set when that particular scan is made

Also: do the current controllers work in this way (i.e. know which scan should be the one to process?), or do they just process the next MS1 that comes along? If the latter, then what we're doing above is quite a big change to the code, no? I.e. it's not just making the shifting work, it's fundamentally changing the controller? Shout me down if that is not the case!

sdrogers commented 4 years ago

One other q for @vinnydavies What's happening on lines 47-48 in TopN.py? https://github.com/sdrogers/vimms/blob/0302803d72ec0572de5255d9c1bfb66993bf6eac/vimms/Controller/topN.py#L47-L48

joewandy commented 4 years ago

Just to clarify the above. We give each scan a scan number. We should do that when we make them. And then the attribute next_scan_to_process should be set when that particular scan is made

Currently the controller doesn't not know the scan number because that's assigned inside the mass spec. But when generating scans, the controller creates ScanParameter objects (see e.g.). When a scan is returned to the controller for processing, the scan_parameter that was used created it is also included (inside the scan.scan_params attribute). To do what's suggested above, we can store next_scan_to_process to be the param for the next MS1 to process, and compare the returned one to it.

Or we can just change the controllers so it assigns the id too. I guess either way is fine.

Also: do the current controllers work in this way (i.e. know which scan should be the one to process?), or do they just process the next MS1 that comes along? If the latter, then what we're doing above is quite a big change to the code, no? I.e. it's not just making the shifting work, it's fundamentally changing the controller? Shout me down if that is not the case!

It used to be that way -- just process the next MS1 that comes along. We tried to modify it to know which one to be processed, but by working out the id, e.g. https://github.com/sdrogers/vimms/blob/shifted_controllers/vimms/Controller/topN.py#L41-L46. What simon suggested is better i think.

joewandy commented 4 years ago

One other q for @vinnydavies What's happening on lines 47-48 in TopN.py? https://github.com/sdrogers/vimms/blob/0302803d72ec0572de5255d9c1bfb66993bf6eac/vimms/Controller/topN.py#L47-L48

if we receive an empty scan (possible), then set the next scan id to process to be the next one?

vinnydavies commented 4 years ago

For the question that is directed at me. There is slightly different processing when we see no peaks at all in a scan, this is basically a temporary fix to stop the id counter getting out of sync

vinnydavies commented 4 years ago

I suggest we maybe need to spent a week re-engineering all the controllers to remove the massive amounts of duplicate code that is now in there. In the process we could also clean up the shifted controller codes

sdrogers commented 4 years ago

Sorry - more confused now. So, the controller assigns a number inside the parameter object? And this is inside the scan object that comes back? In simulation and reality? @joewandy if you’re suggesting we implement something along the lines of what I suggest could you put details in a comment below please? Ie what we need to change who we can all check?

@vinnydavies sounds like a good longer term idea but I’m not keen to delay this work further (someone else will do it). Please put details on a possible restructuring into a new issue for discussion...

sdrogers commented 4 years ago

And the line 47-48 still unclear I’m afraid. How does receiving a blank scan change the scan id of the next one to process?

vinnydavies commented 4 years ago

its because of this line if scan.scan_id == self.next_processed_scan_id and scan.num_peaks > 0: meaning that a scan with no peaks gets processed differently to a scan with peaks (something we probably dont want to happen). We then get the next scan in the list a slightly different way. This may be why the shifted controllers arent working on the machine. I'll fix the other issues with the shifted controllers and then look at this

joewandy commented 4 years ago

Sorry - more confused now. So, the controller assigns a number inside the parameter object? And this is inside the scan object that comes back? In simulation and reality?

@joewandy if you’re suggesting we implement something along the lines of what I suggest could you put details in a comment below please? Ie what we need to change who we can all check?

vinny is halfway doing this i think. Let's wait for the commit first, then we can check.

vinnydavies commented 4 years ago

I have fixed the shifted controllers so they are now all working again. I will continue with more fixes, all of which will have to be fixed and tested for each controller due to large amount of duplicated code within _process_scan

In long term we need to reorganise the controller code otherwise we are going to keep having to fix and test all controllers individually

Note: The id counter is used to determine which scans to process. Previously just the last scan in the list was processed. This was changed to allow the purity method to have ms1 scans between each of the ms2 scans. This change will also allow us to send scans individually to the mass spec as we process them (previously if we had done this there would have been issues if the queue became empty)

sdrogers commented 4 years ago

Thanks @vinnydavies Please start a new issue for reorganisation. For the “note” part: let us know if you are going to work on this tomorrow. If not, I can.

vinnydavies commented 4 years ago

Due to the duplication, its probably easier if I do it. I'll do it tomorrow

sdrogers commented 4 years ago

Ok, great. I've added a commit to that branch that includes a test. You can run with: python -m unittest tests.integration.test_controllers.TestTopNShiftedController It tests a shift of 3. Note that it runs, but cannot see the scans (should see fewer MS2 between the first two MS1 than between subsequent)...but at the moment, we cannot see any MS2 scans in any test output (see issue #17)

sdrogers commented 4 years ago

I also did some playing around with the scan numbers. I made an attribute called scan_no in the TopNController class and increased it every time a scan (MS1 or MS2 was made). I then printed this number alongside the self.next_processed_scan_id thinking that they should be the same. But...they diverge from one another. And the divergence is called by the last if in these lines: https://github.com/sdrogers/vimms/blob/1b9cc75a22d928bdb28c997a7ba100db8a3a3180/vimms/Controller/topN.py#L43-L50 So, when we receive a scan... if it is the one we're waiting for and it has some peaks, we set it as the scan to process (and set pending_tasks_size - what is that?). If it fails these tests (i.e. it either isn't the one we're waiting for, or it is but has no peaks in it), we don't process it. The last if clause doesn't make sense to me -- should it simply be deleted? I.e. we've sent a scan to the machine and we get it back with no peaks in it, so we wait for the next one? What will the next one be? An MS2?

I think the checking for peaks is probably problematic. I guess the issue is that the process_scans method expects some peaks. But it should be able to handle none. I.e. if there aren't any, it just lobs a new MS1 scan in?

This issue probably only shows up in simulation. Hard to imagine a situation with the real MS where we would get empty scans.

So, suggested fixes:

  1. Add a scan counter to the class, that is increased every time a scan is created and is used for next_processed_scan_id. This could probably be initialised in the top level Controller class and then incremented in the sub-classes.
  2. Modify the 'if' block to just check whether or not scan.scan_id == self.next_processed_scan_id chopping out the check on the number of peaks. Remove the final if completely.
  3. Modify the _process_scan method(s) to handle empty scans.
vinnydavies commented 4 years ago

Fixed purity controller. Just need to merge the two branches after testing

joewandy commented 4 years ago

Everything seems to work now.