glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

Add drift method to Simulator #177

Closed vinnydavies closed 3 years ago

vinnydavies commented 4 years ago

Add drift method to allow us to simulate drift. Initial thoughts

sdrogers commented 4 years ago

Easiest option is that we insert something in this method... https://github.com/sdrogers/vimms/blob/a13520ca79870751843ac72dbdc7df5147f43ae8/vimms/MassSpec.py#L600 to map the query_rt to adjusted_query_rt or something...

sdrogers commented 4 years ago

Should an rt mapping be deterministic in the sense that f(x) should always return the same for a given x? Some probably should. But also could imagine a NoisyColumn that generates noise?

sdrogers commented 4 years ago

The reason for above question is that I think we'll end up calling it more often than once per RT

vinnydavies commented 4 years ago

Yes, once generated f(x) should have no noise. This is important for both checking multiplce chemicals at the same RT (simons comment above) and also if we wanted to test multiple methods on the same f(x) for a comparison. You could however generate multiple f(x) using the same process which would have noise

sdrogers commented 4 years ago

Thinking more on this. Be useful to have input from @RonanDaly ... If we add drift as proposed above, peak shapes will get warped. Is that what we want? The alternative is that we apply the warping to the peak RTs in the chemical list before we start the environment...

vinnydavies commented 4 years ago

Might have to be a little careful doing that as long tails at the front of the chromatogram could make estimating the drift function near impossible. Suggest if we do what you suggest above, that we drift the top of the peaks maybe?

sdrogers commented 4 years ago

Doing which? Moving the chemical RT? Not sure what difference it makes drifting the top or the start? They have a fixed length, so moving the start by 10s is the same as moving the top by 10s? But I suspect I misunderstood your comment.

vinnydavies commented 4 years ago

For fixed length simulated peaks, yes it doesn't make a difference. If we used real data, converted into ROIs/chemicals and then added drift it would make a different, but I guess we wouldnt necessarily do this. For example, a chromatogram starting at rt=100 but the peak only appearing at rt=600 could get 1 sec drift (due to start rt=100). Similarly a chromatogram starting at rt=500 and the peak appearing at rt=600 could get 5 seconds drift (due to start rt=500). The peaks would be appearing at the same time, but because of when the noisy chromatogram started, they would have completely different drift

vinnydavies commented 4 years ago

Talking to Stefan now, it seems like this is the correct option: 'The alternative is that we apply the warping to the peak RTs in the chemical list before we start the environment...'. He says that the main drift is a physical shift of the chemicals, rather than a warping of the chemicals. He also said that its quite important that two chemicals that are at the same RT won't necessarily drift the same amount. Suggest that when generating the drift for each chemical that we add noise (ie random sample from prior at a the RT) such that two chemicals with the same RT don't have the same RT after drift

sdrogers commented 4 years ago

That’s useful - so a column in our world is something that provides a new rt for each chemical that has some drift term (ie new rt is function of old rt) and a random term