StatBiomed / UniTVelo

UniTVelo, Temporally Unified RNA Velocity for single cell trajectory inference
https://unitvelo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
25 stars 9 forks source link

TypeError: 'NoneType' object is not iterable #27

Open trebbiano opened 1 year ago

trebbiano commented 1 year ago

Hi, I'm running UniTVelo and obtained some strange results with the default settings. Specifically, the velocity direction was inverted compared to scvelo and biologically feasible results. However, when I modify the default settings, the following error appears. Any ideas?

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/optimize_utils.py:187, in Model_Utils.gene_prior_perc(self, dis)
    184 perc, perc_idx = [], []
    185 aggregrate_weight = np.ones((dis.shape[1], ), dtype=np.float32)
--> 187 for prior in self.config.GENE_PRIOR:
    188     expr = np.array(self.adata[:, prior[0]].layers['Ms'])
    189     perc.append(np.max(expr) * 0.75) # modify 0.75 for parameter tuning

TypeError: 'NoneType' object is not iterable

Parameter choices are as follows:

velo_config = utv.config.Configuration()
velo_config.R2_ADJUST = True
velo_config.IROOT = None
velo_config.FIT_OPTION = '2' ## Default is 1
velo_config.AGENES_R2 = 1

System and version information:

unitvelo==0.2.5
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
Linux version 5.10.16.3-microsoft-standard-WSL2
Python 3.9.15

The full error message:

------> Manully Specified Parameters <------
FIT_OPTION: 2
DENSITY:    Raw
REORDER_CELL:   Hard
AGGREGATE_T:    False
------> Model Configuration Settings <------
N_TOP_GENES:    2000
LEARNING_RATE:  0.01
R2_ADJUST:  True
GENE_PRIOR: None
VGENES: basic
IROOT:  None
--------------------------------------------

Current working dir is /mnt/c/workdir.
Results will be stored in res folder
Extracted 2000 highly variable genes.
Computing moments for 2000 genes with n_neighbors: 30 and n_pcs: 30

# of velocity genes 561 (Criterion: positive regression coefficient between un/spliced counts)
# of velocity genes 535 (Criterion: std of un/spliced reads should be moderate, w/o extreme values)
# of velocity genes 535 (Criterion: genes have reads in more than 5% of total cells)
No GPU device has been detected. Switch to CPU mode.

Loss (Total): 593.525, (Spliced): 290.466, (Unspliced): 303.059: 100%|██████████████████████████████████████████████████████████████████████████████████████████▉| 11999/12000 [1:25:32<00:00,  2.25it/s]

Loss (Total): 593.525, (Spliced): 290.466, (Unspliced): 303.059: 100%|██████████████████████████████████████████████████████████████████████████████████████████▉| 11999/12000 [1:26:54<00:00,  2.30it/s]

Total loss 593.525, vgene loss 593.525

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [8], line 1
----> 1 adata = utv.run_model(adata, 
      2                       label, config_file=velo_config)

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/main.py:64, in run_model(adata, label, config_file, normalize)
     61 model = Velocity(adata, config=config)
     62 model.get_velo_genes()
---> 64 adata = model.fit_velo_genes(basis=basis, rep=rep)
     65 pre_time_gm = adata.obs['latent_time'].values
     67 if config.GENERAL != 'Linear':

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/velocity.py:268, in Velocity.fit_velo_genes(self, basis, rep)
    264 assert self.config.GENERAL in ['Deterministic', 'Curve', 'Linear'], \
    265     'Please specify the correct self.GENERAL in configuration file.'
    267 if self.config.GENERAL == 'Curve':
--> 268     residual, adata = self.fit_curve(self.adata, idx, Ms_scale, Mu_scale, rep=rep)
    270 if self.config.GENERAL == 'Deterministic':
    271     residual = self.fit_deterministic(idx, self.Ms, self.Mu, Ms_scale, Mu_scale)

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/velocity.py:246, in Velocity.fit_curve(self, adata, idx, Ms_scale, Mu_scale, rep)
    243     print (f'Using GPU card: {self.config.GPU}')
    245 with tf.device(device):
--> 246     residual, adata = lagrange(
    247         adata, idx=idx,
    248         Ms=Ms_scale, Mu=Mu_scale, 
    249         rep=rep, config=self.config
    250     )
    252 return residual, adata

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/model.py:416, in lagrange(adata, idx, Ms, Mu, var_names, rep, config)
    404 var_names = make_unique_list(var_names, allow_array=True)
    406 model = Recover_Paras(
    407     adata,
    408     Ms,
   (...)
    413     config=config
    414 )
--> 416 latent_time_gm, s_derivative, adata = model.fit_likelihood()
    418 if 'latent_time' in adata.obs.columns:
    419     del adata.obs['latent_time']

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/model.py:340, in Recover_Paras.fit_likelihood(self)
    337 self.adata.var['velocity_genes'] = self.total_genes if not self.flag else self.idx
    338 self.adata.layers['fit_t'][:, ~self.adata.var['velocity_genes'].values] = np.nan
--> 340 return self.get_interim_t(t_cell, self.adata.var['velocity_genes'].values), s_derivative.numpy(), self.adata

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/optimize_utils.py:351, in Model_Utils.get_interim_t(self, t_cell, idx)
    347         temp = np.reshape(t_cell[:, i], (-1, 1))
    348         t_interim[:, i] = \
    349             np.squeeze(col_minmax(temp, self.adata.var.index[i]))
--> 351 t_interim = self.max_density(t_interim)
    352 t_interim = tf.reshape(t_interim, (-1, 1))
    353 t_interim = tf.broadcast_to(t_interim, self.adata.shape)

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/optimize_utils.py:180, in Model_Utils.max_density(self, dis)
    177     return tf.cast(mean(dis_approx, axis=1), tf.float32)
    179 if self.config.DENSITY == 'Raw':
--> 180     weight = self.gene_prior_perc(dis)
    181     return tf.cast(sum(dis * weight, axis=1), tf.float32)

File ~/miniconda3/lib/python3.9/site-packages/unitvelo/optimize_utils.py:187, in Model_Utils.gene_prior_perc(self, dis)
    184 perc, perc_idx = [], []
    185 aggregrate_weight = np.ones((dis.shape[1], ), dtype=np.float32)
--> 187 for prior in self.config.GENE_PRIOR:
    188     expr = np.array(self.adata[:, prior[0]].layers['Ms'])
    189     perc.append(np.max(expr) * 0.75) # modify 0.75 for parameter tuning

TypeError: 'NoneType' object is not iterable
michaelgmz commented 1 year ago

Hi @trebbiano ,

1) Thx for mentioning the directionality of the model. Indeed sometimes the unified-time setting has this potential drawback that the given velocity is reversed. This might also occur in other models with a similar design in how cell-specific time points are generated (e.g. scTour). For now, we might not have an optimal way to determine this beforehand, however, we've provided an alternative, to run the model twice with the opposite initialization of parameters, if a reversed streamline field is observed. config.NUM_REP =2 might do the trick.

2) The error above might be related to an unfinished feature I tried to implement, which should be temporarily removed. Kindly try version==0.2.5.1 to see if it works.

Bests, Mingze

trebbiano commented 1 year ago

Hi Mingze,

Thanks for the reply. Is the current Github version equivalent to 0.2.5.1? I didn't find this version actually mentioned anywhere in the repo.

Best, Jerry

trebbiano commented 1 year ago

To follow up on my previous message, I reinstalled the UniTVelo from the current version and enabled GPU support in a fresh conda environment. These steps eliminated the errors I was seeing earlier. The resulting vector fields look identical to the default in the case of NUM_REP=2, i.e. with inverted directions of the vectors, and similar to scVelo with the parameter FIT_OPTION='1'. The latent time looks strange, however, even for the scVelo-like output. I wonder what the implications are. Does the inverted result with unified time mode suggest that the assumptions of that model do not apply to my dataset?

Also, do any of the 2 models take into account that there may be multiple points of origin when calculating latent time?

michaelgmz commented 1 year ago

Hi @trebbiano ,

For the previous post, the current GitHub version is 0.2.5.2, which contains a patch for a parameter name error, updated by altairwei in utils.py. The core module is the same as 0.2.5. Glad to hear a new environment and package eliminated the error.

The resulting vector fields look identical to the default in the case of NUM_REP=2 In this case, I guess the total loss of two trails was compared, and seems the first rep had a lower loss. Mathematically, this is possible for regression tasks with much noise contained, especially when the independent variable x (cell-specific time points t, here in our RNA velocity task) is also a trainable parameter.

Another parameter that you could give a shot, is config.IROOT in the configuration file. You could consider IROOT = None as a completely unsupervised task, whilst this can also be changed with gcount to initialize the model. The differentiation time is somehow correlated to the statistics of gene counts, which shares a similar but simplified idea with CytoTRACE. If you already have a brief landscape of your data, the general differentiation direction, for instance, you could set config.IROOT = 'Name of cell cluster in adata.obs.columns' to specify the origin of trajectory for validation purposes, so it becomes a semi-supervised task.

The latent time looks strange, however, even for the scVelo-like output. Does this mean the two latent time plots generated by both methods are not desired? It might be hard to interpret here without figures. But empirically, if your dataset contains a cell cycle phase clearly (the S G2M phase specifically), then scVelo is a very good choice to try, if not, I guess the performance of scVelo is doubted.

Does the inverted result with unified time mode suggest that the assumptions of that model do not apply to my dataset? It's possible, for instance, the subset of T cells' immune response when they are stimulated. Simply put, Naive T cells -> Effector T cells -> Memory T cells, we've found that current model assumptions failed to capture this biological process as the expression profiles for quite a few genes of memory t cells are in between of the other two types of cell, which affect the optimization process. But I cannot comment more since I do not know the type of datasets and tissue environments you are working on.

Also, do any of the 2 models take into account that there may be multiple points of origin when calculating latent time? In short, no, or we haven't tested on this type of dataset before (except the DentateGyrus one which is extremely sparse in lower embeddings). We've taken multiple ending points into account, the multi-branching trajectory, e.g. human bone marrow differentiation process. Speaking of which, datasets with multiple origin sites could be an interesting hint to explore.

Bests

altairwei commented 1 year ago

In my project practice, setting config.IROOT is a very good way to reverse the direction of streamlines.

trebbiano commented 1 year ago

Thank you for the detailed reply! This makes more sense. It seems the unified time mode tends to oversimplify the actual structure of the data, even in cases where the vector directions are not inverted. I have obtained better results with the independent mode. For the multi-origin development, I would be happy to share a dataset if that helps. It contains an actively cycling cell population which serves as one origin, and a blood-derived population which serves as the second origin. I was always confused why scVelo accurately captures the vector fields as consistent with two origins, but then generates a latent time plot consistent with a single-origin model. So far I have only been able to capture the multi-origin feature using cellrank. Here is the scVelo output for this dataset: image For comparison, independent mode unitvelo output: image Cluster 6 in this dataset comprises actively cycling cells. I am interested in the relationship between clusters 6,4,0 and 3, and whether the scvelo vector field captures this relationship better than unitvelo or vice versa.