Open trebbiano opened 1 year ago
Hi @trebbiano ,
1) Thx for mentioning the directionality of the model. Indeed sometimes the unified-time setting has this potential drawback that the given velocity is reversed. This might also occur in other models with a similar design in how cell-specific time points are generated (e.g. scTour). For now, we might not have an optimal way to determine this beforehand, however, we've provided an alternative, to run the model twice with the opposite initialization of parameters, if a reversed streamline field is observed. config.NUM_REP =2
might do the trick.
2) The error above might be related to an unfinished feature I tried to implement, which should be temporarily removed. Kindly try version==0.2.5.1
to see if it works.
Bests, Mingze
Hi Mingze,
Thanks for the reply. Is the current Github version equivalent to 0.2.5.1? I didn't find this version actually mentioned anywhere in the repo.
Best, Jerry
To follow up on my previous message, I reinstalled the UniTVelo from the current version and enabled GPU support in a fresh conda environment. These steps eliminated the errors I was seeing earlier. The resulting vector fields look identical to the default in the case of NUM_REP=2
, i.e. with inverted directions of the vectors, and similar to scVelo with the parameter FIT_OPTION='1'
. The latent time looks strange, however, even for the scVelo-like output. I wonder what the implications are. Does the inverted result with unified time mode suggest that the assumptions of that model do not apply to my dataset?
Also, do any of the 2 models take into account that there may be multiple points of origin when calculating latent time?
Hi @trebbiano ,
For the previous post, the current GitHub version is 0.2.5.2, which contains a patch for a parameter name error, updated by altairwei in utils.py. The core module is the same as 0.2.5. Glad to hear a new environment and package eliminated the error.
The resulting vector fields look identical to the default in the case of NUM_REP=2
In this case, I guess the total loss of two trails was compared, and seems the first rep had a lower loss. Mathematically, this is possible for regression tasks with much noise contained, especially when the independent variable x (cell-specific time points t, here in our RNA velocity task) is also a trainable parameter.
Another parameter that you could give a shot, is config.IROOT
in the configuration file.
You could consider IROOT = None
as a completely unsupervised task, whilst this can also be changed with gcount
to initialize the model. The differentiation time is somehow correlated to the statistics of gene counts, which shares a similar but simplified idea with CytoTRACE.
If you already have a brief landscape of your data, the general differentiation direction, for instance, you could set config.IROOT = 'Name of cell cluster in adata.obs.columns'
to specify the origin of trajectory for validation purposes, so it becomes a semi-supervised task.
The latent time looks strange, however, even for the scVelo-like output.
Does this mean the two latent time plots generated by both methods are not desired? It might be hard to interpret here without figures. But empirically, if your dataset contains a cell cycle phase clearly (the S G2M phase specifically), then scVelo is a very good choice to try, if not, I guess the performance of scVelo is doubted.
Does the inverted result with unified time mode suggest that the assumptions of that model do not apply to my dataset?
It's possible, for instance, the subset of T cells' immune response when they are stimulated. Simply put, Naive T cells -> Effector T cells -> Memory T cells, we've found that current model assumptions failed to capture this biological process as the expression profiles for quite a few genes of memory t cells are in between of the other two types of cell, which affect the optimization process. But I cannot comment more since I do not know the type of datasets and tissue environments you are working on.
Also, do any of the 2 models take into account that there may be multiple points of origin when calculating latent time?
In short, no, or we haven't tested on this type of dataset before (except the DentateGyrus one which is extremely sparse in lower embeddings). We've taken multiple ending points into account, the multi-branching trajectory, e.g. human bone marrow differentiation process. Speaking of which, datasets with multiple origin sites could be an interesting hint to explore.
Bests
In my project practice, setting config.IROOT
is a very good way to reverse the direction of streamlines.
Thank you for the detailed reply! This makes more sense. It seems the unified time mode tends to oversimplify the actual structure of the data, even in cases where the vector directions are not inverted. I have obtained better results with the independent mode. For the multi-origin development, I would be happy to share a dataset if that helps. It contains an actively cycling cell population which serves as one origin, and a blood-derived population which serves as the second origin. I was always confused why scVelo
accurately captures the vector fields as consistent with two origins, but then generates a latent time plot consistent with a single-origin model. So far I have only been able to capture the multi-origin feature using cellrank
. Here is the scVelo output for this dataset:
For comparison, independent mode unitvelo
output:
Cluster 6 in this dataset comprises actively cycling cells.
I am interested in the relationship between clusters 6,4,0 and 3, and whether the scvelo
vector field captures this relationship better than unitvelo
or vice versa.
Hi, I'm running UniTVelo and obtained some strange results with the default settings. Specifically, the velocity direction was inverted compared to scvelo and biologically feasible results. However, when I modify the default settings, the following error appears. Any ideas?
Parameter choices are as follows:
System and version information:
The full error message: