Something to take care of!

m-a-r-i-u-s commented 6 months ago

A summary of some points that came into my mind as I reviewed the code

[x] The variables z1_a and z_a in the function assign_core_lipids can change from run to run. They depend on the (random) permutation of the order parameters between the lipids. I think it would be a good idea to calculate them directly in the function permut_getis_ord_stat based on the array g_star_i.
[x] Vectorize the functions order_parameter and calc_order_parameter. The reason I implemented it like that was to (really) ensure that the angle between the right beads are calculated. However, the usage of the two for-loop would kill performance. I think we can trust MDAnalysis that is doesn't randomly re-orders atoms in a selection group, but we cannot trust users. Therefore, it would be nice to have a kind of "safety net" in base.py that checks the distance between two subsequent beads/atoms in a selection. If the distance is larger than a pre-defined cutoff (e.g. the physical distance between two C-atoms, which is ~0.15nm) the program should raise at least a warning.
[x] Only the average of the S_CC values over a single chain should be stored, if we store single S_CC values the results array will be pretty huge. However, we should have a unit-test that checks the correctness of this function.
[x] Size of the weight matrices scale quadratically with the number of lipids. For large membranes (lateral length > 50nm) jupyter notebooks keep crashing due to "too big" numpy arrays. I thought about the usage of sparse matrices (e.g., from scipy module) since the weight matrices have a high sparsity (a lot of lipids are not connected with each other). I tested already a little bit the coo_array format from scipy and that worked quite well because it seems to preserve a number of basic numpy.array functionalities.

yusuferentunc commented 5 months ago

[x] Vectorize the functions order_parameter and calc_order_parameter. The reason I implemented it like that was to (really) ensure that the angle between the right beads are calculated. However, the usage of the two for-loop would kill performance. I think we can trust MDAnalysis that is doesn't randomly re-orders atoms in a selection group, but we cannot trust users. Therefore, it would be nice to have a kind of "safety net" in base.py that checks the distance between two subsequent beads/atoms in a selection. If the distance is larger than a pre-defined cutoff (e.g. the physical distance between two C-atoms, which is ~0.15nm) the program should raise at least a warning.

Vectorisation is done in #10 Safety net will be implement in further works

yusuferentunc commented 5 months ago

[x] Only the average of the S_CC values over a single chain should be stored, if we store single S_CC values the results array will be pretty huge. However, we should have a unit-test that checks the correctness of this function.

I believe mean is done at https://github.com/m-a-r-i-u-s/domhmm/blob/0e7c0dca0f1dcca9e7ffef3be6e33c4fb63b6316/domhmm/analysis/domhmm.py#L124

m-a-r-i-u-s commented 5 months ago

Yep! Fine.

yusuferentunc commented 5 months ago

[x] Size of the weight matrices scale quadratically with the number of lipids. For large membranes (lateral length > 50nm) jupyter notebooks keep crashing due to "too big" numpy arrays. I thought about the usage of sparse matrices (e.g., from scipy module) since the weight matrices have a high sparsity (a lot of lipids are not connected with each other). I tested already a little bit the coo_array format from scipy and that worked quite well because it seems to preserve a number of basic numpy.array functionalities.

Sparce matrix implementation for weight matrices done at branch weight_matrix_optimize. Scipy Sparse's csr_array is selected for compressing method since it works just like a numpy array comparing to coo_array.

Pull request will be open whenever GitHub actions' monthly quota renewed.

BioMemPhys-FAU / domhmm

Something to take care of! #14