aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
417 stars 59 forks source link

how can i get a lower vector-field energy? #356

Closed chansigit closed 2 years ago

chansigit commented 2 years ago

after reconstructing a field from the velocities, dyn.vf.VectorField(adata, basis='umap', M=1000, pot_curl_div=True, min_vel_corr=0.8, restart_num=500, cores=20 )

I observed a poor energy-decreasing pattern. image

Are there any parameters I can tune to improve this energy behavior?

Xiaojieqiu commented 2 years ago

We use sparseVFC for vector field learning. Please check its documentation for available parameters: https://dynamo-release.readthedocs.io/en/latest/_autosummary/dynamo.vf.SparseVFC.html#dynamo.vf.SparseVFC

The method section from our cell paper also provide some guidances on the behavior of the parameters.

My experience is that even the energy profile doesn't look the most beautiful the vector field is often still good.

chansigit commented 2 years ago

Thank you, I found the "Effects of parameters in vector field reconstruction" part in your paper.

I tried these parameters: M, lambda and set MaxIter 5k~10k, and found: 1) the velocity streamline on the cells did not vary a lot, and the cell potential landscape did not vary a lot either; 2) the streamlines outside of the cells varied when I set different M and lambda. Higher lambda_ values made the streamline stubborn and straight as expected (under-fitting). Lower M values increase the sparsity of the streamlines as expected; 3) the streamlines outside of the cells are not very stable- the arrow directions and the whirlpool locations changed a lot and. 4) the parameter tuning rarely improved energy descent curve, and the energy keeps oscillating up and down.


image
image
image
image

Let me ask some additional questions: 1) the convergence of the vector-field energy seems poor in my case, can I tune some optimization hyper-parameters like step length or learning rate to attenuate the oscillation? 2) some points on the energy curve once reaches the lowest historical local minima and soon went back to higher levels, can I resume the energy to the local minima step after finishing the optimization? 3) since I have no idea what cell types the streamline-possessing empty area represent, plus the streamline outside of the cells are unstable, how should i interpret these part of the vector field?

Thank you again for your kind help

chansigit commented 2 years ago

another minor document issue:

the codes (https://github.com/aristoteleo/dynamo-release/blob/de389d2cff3fca80ce243201df173e190f323941/dynamo/vectorfield/scVectorField.py#L336) set the default lambda value to be 3, but the docstring conflicts with the code lambda ('float' (default: 0.3)) (https://dynamo-release.readthedocs.io/en/latest/_autosummary/dynamo.vf.SparseVFC.html#dynamo.vf.SparseVFC)

Xiaojieqiu commented 2 years ago

Thanks for reporting your observations! and they are consistent with what I know. Indeed our approach is generally robust to the underlying parameters. You can also see the main/supplementary figure 4 for some in-depth analyses we did to show the robustness and accuracy of our approach.

the convergence of the vector-field energy seems poor in my case, can I tune some optimization hyper-parameters like step length or learning rate to attenuate the oscillation?

We are using a kernel method for learning so we don't really have the step length and learning rate parameters. The link I shared above shows all parameters from dynamo. We are developing a new method which will include such parameters. Please check back in a few weeks.

some points on the energy curve once reaches the lowest historical local minima and soon went back to higher levels, can I resume the energy to the local minima step after finishing the optimization?

We didn't do theoretical analyses on the behavior of the energy function. So I am honest not sure the oscillation of the energy is a sight of bad fitting. You may find the original paper helpful on this regard: https://www.sciencedirect.com/science/article/pii/S0031320313002410

since I have no idea what cell types the streamline-possessing empty area represent, plus the streamline outside of the cells are unstable, how should i interpret these part of the vector field?

In general, I recommend focusing on area with cells or near cells. The power of vector field is not just for visualizing the streamline but the downstream differential geometry analyses with this continuous function. I would recommend to play with those acceleration, curvature, Jacobian calculations. You may also try the least action path and in silico perturbation predictions. All these analyses will most of the time deal with areas populated with cells.

For streamline near area without cells, it can be treated as the "imagination" for these empty areas by the tool. Since these are entire empty regions without any constraint, therefore it may don't have a unique / perfect solution and results comes from the underlying continuousness and kernel assumption of sparseVFC.

Thanks for the document issue. I am busy these days and please feel free to make a pull request to fix this and others

chansigit commented 2 years ago

got your ideas. really appreciate your guidance.

hyjforesight commented 2 years ago

The "Regularized vector field learning with sparse approximation for mismatch removal" paper is totally beyond what I can understand. Is there an easier way to understand the parameters of dynamo.vf.SparseVFC and the best pattern of engery should be? Thanks!

Xiaojieqiu commented 2 years ago

You can read the documentation of dyn.vf.SparseVFC and also the method section Effects of parameters in vector field reconstruction of our dynamo paper that discusses the parameters in the package

Xiaojieqiu commented 2 years ago

regarding the energy profile, generally it should decrease dramatically from the first iteration. My experience is that often times the default parameters works well and if you want more refined vector field, try increase basis function M to 500 or 100 as well as increase the maximal learning iterations, maxIter