aalto-ics-kepaco / msms_rt_ssvm

Implementation of the LC-MS²Struct model published in the manuscript "Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data" by Bach et al.
MIT License
6 stars 5 forks source link

Do profiling of the fixed-ms2 score SSVM #10

Open bachi55 opened 3 years ago

bachi55 commented 3 years ago

... where do we spend most of the run-time?

bachi55 commented 3 years ago

Till now we could reach a speed-up by factor 4 with very simple modifications:

There is still room for improvement:

Top list of run-time consuming parts:

  Function Call Count Time (ms) Own Time (ms) Comment
sqlite3.Connection (execute) 261257 26353 26353 (29.6%) Data loading from the SQLite
scipy.spatial._distance_wrap.cdist_citiblock 87952 8358 8358 (9.4%) Molecule kernel calculation
_get_molecule_feature_matrix 85789 14610 8167 (9.2%) Data loading from the SQLite
numpy.ufunc (reduce)  363454 2834 2834 (3.2%)
generalized_tanimoto_kernel 87952 30925 2597 (2.9%) Molecule kernel calculation
str (split)  16533569 2259 2259 (2.5%)
_get_lambda_delta 84784 34755 2024
listcomp  89728 1815 1815
sqlite3.Cursor (fetchall)  171882 1741 1741
check_array 176366 10176 1652
numpy.array  923289 1531 1530
_assert_all_finite 176366 5104 1483
inner_f 440877 25802 1312
cdist 87952 11057 897
get_molecule_features_by_molecule_id 84928 41548 888
bachi55 commented 3 years ago

We could reach a further improvement by do some optimizations on the kernel computation side:

The biggest speedup, so far though, was reached by computing the preference values for all candidates associated with one sequence (so of all nodes) in one go.

bachi55 commented 3 years ago
Function n_A n_B d time (4 cores) time (1 core)
alg_1__numba 100 5000 307 0.057894 0.262477
cdist__FAST 100 5000 307 0.122171 0.122416
ufunc 100 5000 307 0.137654 0.323284
--- --- --- --- --- ---
alg_1__numba 1000 1000 307 0.313120 0.539815
cdist__FAST 1000 1000 307 0.203876 0.193676
ufunc 1000 1000 307 0.357854 0.593559
--- --- --- --- --- ---
alg_1__numba 100 5000 7500 4.285039 9.505218
cdist__FAST 100 5000 7500 3.065871 3.094869
ufunc 100 5000 7500 2.244482 6.121194
--- --- --- --- --- ---
alg_1__numba 1000 1000 7500 13.003921
cdist__FAST 1000 1000 7500 5.654260
ufunc 1000 1000 7500 4.224893

We used 4 cores for the parallel computations ('alg_1__numba' and 'ufunc'). Depending on the input data dimension different algorithms perform the best.

For example, when the costs per kernel matrix element increases (d=7500) the ufunc implementation performs the best.

bachi55 commented 3 years ago

When the output matrix has a rectangular shape, i.e. n_A < n_B, than the 'alg_1__numba' performs the best. This case appears when the preference values are calculated where n_A is the number of active candidates and n_B are the candidates for which the preference values should be calculated.

Function n_A n_B d time
alg_1__numba 100 100 307 0.001172
cdist__FAST 100 100 307 0.002104
ufunc 100 100 307 0.008020
--- --- --- --- ---
alg_1__numba 100 200 307 0.002162
cdist__FAST 100 200 307 0.003946
ufunc 100 200 307 0.009165
--- --- --- --- ---
alg_1__numba 100 500 307 0.005299
cdist__FAST 100 500 307 0.009801
ufunc 100 500 307 0.018691
--- --- --- --- ---
alg_1__numba 100 1000 307 0.010685
cdist__FAST 100 1000 307 0.019640
ufunc 100 1000 307 0.030049
--- --- --- --- ---
alg_1__numba 100 5000 307 0.052604
cdist__FAST 100 5000 307 0.117722
ufunc 100 5000 307 0.125857
--- --- --- --- ---
alg_1__numba 100 10000 307 0.103035
cdist__FAST 100 10000 307 0.234538
ufunc 100 10000 307 0.245509
--- --- --- --- ---
alg_1__numba 100 100000 307 1.234449
cdist__FAST 100 100000 307 2.443562
ufunc 100 100000 307 2.576871
bachi55 commented 3 years ago

N = 24, n_cand=50, L_max=15

Numpy: 22.856s Numba: 28.291s

N = 48, n_cand=50, L_max=15

Numpy: 97.751s Numba: 97.639s

N = 96, n_cand=50, L_max=15

Numpy: 321.269s Numba: 292.787s

N = 24, n_cand=50, L_max=25

Numpy: 27.407s Numba: 33.160s

N = 24, n_cand=100, L_max=15

Numpy: 37.855s Numba: 42.473s