Closed Zatyrus closed 2 years ago
I did some profiling on my own and you're absolutely right. I thought we could gain some performance by using numpy's optimised matrix operations instead of manually looping but this naturally causes the memory requirements to explode (kind of obvious in hindsight), which is probably causing the crashes as well for bigger tracks.
As the proposed method behaves exactly the same as the old one but is way faster I don't see any reason against changing it. If you want you can make the changes and create a pull request yourself.
This is the section that would need changing: https://github.com/Eggeling-Lab-Microscope-Software/TRAIT2D/blob/27e61ba398dab0351c5259fb16273fbdf7d3bcda/trait2d/analysis/__init__.py#L995-L1015
On another note, this might be good opportunity to maybe write some unit tests so we can automatically check whether new methods like this work correctly. I don't have much time right now but maybe I'll get around to it when I work on the other issues.
Below is the output of the attached notebook I used for profiling:
%load_ext memory_profiler
from trait2d.analysis import Track
track = Track.from_file("Brownian_Diff_Test_20000.csv", unit_length='micrometres')
%timeit track.calculate_msd()
%memit track.calculate_msd()
8.72 s ± 196 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
peak memory: 9300.71 MiB, increment: 9186.85 MiB
import numpy as np
def My_MSD(data):
N = len(data)
col_Array = np.zeros(N-3)
for i in range(1,N-2):
col_Array[i-1] = (np.mean(np.sum(np.abs((data[1+i:N,:] - data[1:N - i,:]) ** 2), axis=1)))
return col_Array
import pandas as pd
data = np.array(pd.read_csv("Brownian_Diff_Test_20000.csv")[["x", "y"]])
%timeit My_MSD(data)
%memit new = My_MSD(data)
995 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
peak memory: 115.93 MiB, increment: 0.04 MiB
np.sum(np.abs(track.get_msd() - new))
6.740269893435732e-11
Will close this when the relative PR is merged
This has been now implemented.
While coding my own project I came up with an own version of an MSD calculation that runs significantly faster and tends to be more stable than the one currently implemented.
Used from TRAIT2D
track.calculate_msd()
Proposed MSD Calculation
Analyzing the attached 20000 steps Brownian Diffusion CSV file, the legacy MSD takes about 8.4 seconds and tends to crash for larger tracks. The proposed method takes about 1.1 seconds and runs stable even for tracks as large as 100000 steps (43 seconds).
Please also view the attached VS_plot.png for result verification created for the Brownian_Diff_20000.csv Dataset.
Brownian_Diff_Test_20000.csv Brownian_Diff_Test_100000.csv