glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

Make WeightedDEW controller faster #204

Closed joewandy closed 2 years ago

joewandy commented 3 years ago

A bit of a low priorty I guess, but just to note it here so we don't forget.

Seems that these lines are unnecessarily slow, especially the repeated sorting in _is_excluded: https://github.com/sdrogers/vimms/blob/master/vimms/Controller/topN.py#L327-L342.

WeightedDEW

joewandy commented 3 years ago

Suggestion from @mcbrider5002 to test

        exclusion_list = (x for x in itertools.chain(self.exclusion_list, self.temp_exclusion_list) if x.from_mz <= mz <= x.to_mz and x.from_rt <= rt <= x.to_rt)
        try: x = max(exclusion_list, key=lambda x: x.from_rt)
        except ValueError: return False, 1
        logger.debug('Excluded precursor ion mz {:.4f} rt {:.2f} because of {}'.format(mz, rt, x))
        if rt <= x.frag_at + self.exclusion_t_0:
            return True, 0.0
        else:
            weight = (rt - (self.exclusion_t_0 + x.frag_at)) / (self.rt_tol - self.exclusion_t_0)
            assert weight <= 1, weight
            # self.remove_exclusion_items.append(x)
            return True, weight
mcbrider5002 commented 3 years ago

My guess at the purpose of that sort is to merge the exclusion_list and temp_exclusion_list, but maintain sorted order (which is a bit overkill to find the item satisfying condition with highest from_rt) - if both these lists are sorted in terms of from_rt to begin with, you should be able to exploit that by searching them separately and cutting off search once a max from_rt with condition is found - then just take the max of the two values you get from searching each list. (Above code snippet doesn't make the assumption they're sorted, and so should linearly search both lists using an iterator rather than creating new list in memory + sorting.)

_In the case that this assumption holds and they are fromrt-sorted, I think you can use something like:

maxm = (0, None)

for elem in reversed(self.temp_exclusion_list):
  if(x.from_rt <= rt <= x.to_rt and x.from_mz <= mz <= x.to_mz): #can bisect from_rt to only search values less than rt in one half
    maxm = (x.from_rt, x)
    break

for elem in reversed(self.exclusion_list):
  if(x.from_rt <= maxm): break #can bisect instead
  if(x.from_rt <= rt <= x.to_rt and x.from_mz <= mz <= x.to_mz): #can bisect from_rt to only search values less than rt in one half
    maxm = (x.from_rt, x)
    break

_, x = maxm
if(x is None): return False, 1
else: 
  logger.debug('Excluded precursor ion mz {:.4f} rt {:.2f} because of {}'.format(mz, rt, x))
  if rt <= x.frag_at + self.exclusion_t_0: return True, 0.0
  else:
    weight = (rt - (self.exclusion_t_0 + x.frag_at)) / (self.rt_tol - self.exclusion_t_0)
    assert weight <= 1, weight
    # self.remove_exclusion_items.append(x)
    return True, weight
joewandy commented 3 years ago

TODO: the WeightedDEW exclusion window code can be optimised so it works faster, and once done, it can be generalised so it can be used by other controllers too.

Also look into speeding up the codes with example GNPS data if possible

joewandy commented 2 years ago

Code has been generalised but it's still slow ...

joewandy commented 2 years ago

No longer need to do this I think