aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
415 stars 59 forks source link

dyn.pd.perturbation: TypeError: unsupported operand type(s) for +: 'float' and 'NoneType' #481

Closed Mingsenli closed 1 year ago

Mingsenli commented 1 year ago

Dear Dr. Qiu:

Dynamo is so excellent and helps me a lot. Thank you very much for providing such a powerful tool.

When i run dyn.pd.perturbation, i am meeting an error. I really don’t know what caused this error. I sincerely hope you can help me solve this problem. Thank you very much!

My code is :

dyn.pp.recipe_monocle(adata) dyn.tl.dynamics(adata, cores=48) dyn.tl.reduceDimension(adata)

dyn.tl.cell_velocities(adata,basis='pca') dyn.vf.VectorField(adata,basis='pca', M=1000) dyn.tl.cell_velocities(adata,basis='umap') dyn.vf.VectorField(adata,basis='umap', M=1000)

dyn.vf.rank_velocity_genes(adata, groups='leiden_anno', vkey="velocity_S"); rank_speed = adata.uns['rank_velocity_S']; rank_abs_speed = adata.uns['rank_abs_velocity_S']; dyn.vf.acceleration(adata, basis='pca')

dyn.vf.rank_acceleration_genes(adata, groups='leiden_anno', akey="acceleration", prefix_store="rank"); rank_acceleration = adata.uns['rank_acceleration']; rank_abs_acceleration = adata.uns['rank_abs_acceleration'];

dyn.vf.curvature(adata, basis='pca'); dyn.vf.rank_curvature_genes(adata, groups='leiden_anno');

dyn.pp.top_pca_genes(adata, n_top_genes=500); top_pca_genes = adata.var.index[adata.var.top_pca_genes];

top_pca_genes = ['RORA', 'PITX1','KRT3', 'KRT12',"MAL"] + list(top_pca_genes)

dyn.vf.jacobian(adata, regulators=top_pca_genes, effectors=top_pca_genes);

dyn.ext.ddhodge(adata, basis='pca') dyn.ext.ddhodge(adata, basis='umap')

basis = "pca" dyn.vf.speed(adata, basis=basis) dyn.vf.divergence(adata, basis=basis) dyn.vf.acceleration(adata, basis=basis) dyn.vf.curvature(adata, basis=basis)

basis = "umap" dyn.vf.speed(adata, basis=basis) dyn.vf.divergence(adata, basis=basis) dyn.vf.acceleration(adata, basis=basis) dyn.vf.curvature(adata, basis=basis)

gene = "RORA" dyn.pd.perturbation(adata, gene, [-100], emb_basis="umap")

|-----> In silico perturbation of single-cells and prediction of cell fate after perturbation... |-----> Retrive X_pca, PCs, pca_mean... |-----> Calculate perturbation effect matrix via \delta Y = J \dot \delta X....

TypeError Traceback (most recent call last)

in 1 gene = "RORA" ----> 2 dyn.pd.perturbation(adata, gene, [-100], emb_basis="umap") 3 #dyn.pl.streamline_plot(adata, color=["leiden_anno", gene], basis="umap_perturbation") ~/.local/lib/python3.8/site-packages/dynamo/prediction/perturbation.py in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key) 244 245 # project pca gene expression back to original gene expression: --> 246 X = pca_to_expr(X_pca, PCs, means) 247 248 # get gene position ~/.local/lib/python3.8/site-packages/dynamo/prediction/utils.py in pca_to_expr(X, PCs, mean, func) 489 # reverse project from PCA back to raw expression space 490 if PCs.shape[1] == X.shape[1]: --> 491 exprs = X @ PCs.T + mean 492 if func is not None: 493 exprs = func(exprs) TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'
Xiaojieqiu commented 1 year ago

Hi @Mingsenli it looks like one of the X, PCs or means as shown in this line of code X @ PCs.T + mean is None. So you can check the following things:

  1. is the "RORA" a gene used for pca dimension reduction (check whether adata[:, 'RORA'].var['use_for_pca'] is True. This is because the vector field will only be built from the genes used for creating the pca embeding. If this gene is not detected as feature genes for pca dimension reduction, please append this gene and other genes of interests when running dyn.pp.receipe_monocle with the argument genes_to_append.
  2. check whether your adata has PCs. that is adata.uns['PCs'] exists
  3. check whether your adata has pca_mean, that is adata.uns['pca_mean'].

Hope this helps and please let me know how everything goes!

Mingsenli commented 1 year ago

Dear Dr. Qiu:

Thank you for answering my question.

  1. My adata[:, 'RORA'].var['use_for_pca'] is True.
  2. My adata also has PCs and pca_mean.

So, This error was not caused by these reasons. There must be other questions. Could you please help me address this problem?

Looking forward to hearing from you.

My adata is:
AnnData object with n_obs × n_vars = 18794 × 1999 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'percent.HB', 'S.Score', 'G2M.Score', 'Phase', 'group', 'nCount_SCT', 'nFeature_SCT', 'integrated_snn_res.2', 'seurat_clusters', 'cell_type', 'barcode', 'UMAP_1', 'UMAP_2', 'batch', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'Size_Factor', 'initial_cell_size', 'ntr', 'cell_cycle_phase', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'acceleration_pca', 'curvature_pca', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'pca_ddhodge_sampled', 'pca_ddhodge_div', 'pca_potential', 'pca_ddhodge_potential', 'umap_ddhodge_sampled', 'umap_ddhodge_div', 'umap_potential', 'umap_ddhodge_potential', 'speed_pca', 'divergence_pca', 'speed_umap', 'divergence_umap', 'acceleration_umap', 'curvature_umap' var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'nCells', 'nCounts', 'pass_basic_filter', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition' uns: 'cell_type_colors', 'log1p', 'pp', 'PCs', 'explained_varianceratio', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'neighbors', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'rank_velocity_S', 'rank_abs_velocity_S', 'rank_acceleration', 'rank_abs_acceleration', 'rank_curvature', 'rank_abs_curvature', 'VecFld_umap', 'kinetics_heatmap', 'cell_type_graph' obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'acceleration_pca', 'curvature_pca', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_umap', 'curvature_umap' layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', 'X_spliced', 'X_unspliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration', 'curvature' obsp: 'moments_con', 'distances', 'connectivities', 'pearson_transition_matrix'

Xiaojieqiu commented 1 year ago

this is strange. what about any other genes? Can you predict perturbation effects for any genes?

Mingsenli commented 1 year ago

All genes can not been predicted. I tried other genes but the same error occurred.

Xiaojieqiu commented 1 year ago

would you mind to send me some example data for me to reproduce your error?

I think your code is fine. I am able to use our pancreas dataset to run perturbation prediction successfully:

adata = dyn.sample_data.pancreatic_endocrinogenesis()
dyn.pp.recipe_monocle(adata)
dyn.tl.dynamics(adata, cores=48)
dyn.tl.reduceDimension(adata)

dyn.tl.cell_velocities(adata,basis='pca')
dyn.vf.VectorField(adata,basis='pca', M=1000)
dyn.tl.cell_velocities(adata,basis='umap')
dyn.vf.VectorField(adata,basis='umap', M=1000)

adata.var.use_for_pca
dyn.pd.perturbation(adata, 'Erdr1', [-100], emb_basis="umap")

Are you able to confirm this also?

Mingsenli commented 1 year ago

I tried adata of dyn.sample_data.pancreatic_endocrinogenesis() as you do. The same error occurred. So, the problem lies in my Dynamo having an issue. What’s the problem with my Dynamo? How to solve this problem?

Thank you

|-----> In silico perturbation of single-cells and prediction of cell fate after perturbation... |-----> Retrive X_pca, PCs, pca_mean... |-----> Calculate perturbation effect matrix via \delta Y = J \dot \delta X....

TypeError Traceback (most recent call last)

in ----> 1 dyn.pd.perturbation(adata, 'Erdr1', [-100], emb_basis="umap") ~/.local/lib/python3.8/site-packages/dynamo/prediction/perturbation.py in perturbation(adata, genes, expression, perturb_mode, cells, zero_perturb_genes_vel, pca_key, PCs_key, pca_mean_key, basis, emb_basis, jac_key, X_pca, delta_Y, projection_method, pertubation_method, J_jv_delta_t, delta_t, add_delta_Y_key, add_transition_key, add_velocity_key, add_embedding_key) 244 245 # project pca gene expression back to original gene expression: --> 246 X = pca_to_expr(X_pca, PCs, means) 247 248 # get gene position ~/.local/lib/python3.8/site-packages/dynamo/prediction/utils.py in pca_to_expr(X, PCs, mean, func) 489 # reverse project from PCA back to raw expression space 490 if PCs.shape[1] == X.shape[1]: --> 491 exprs = X @ PCs.T + mean 492 if func is not None: 493 exprs = func(exprs) TypeError: unsupported operand type(s) for +: 'float' and 'NoneType'
Mingsenli commented 1 year ago

Another problem is that my adata can not be saved after running dynomo. The save and perturbation prediction fail to execute. Other commands are successful.

My code is : adata = sc.read_h5ad('dysplasia_adult_count_merge_dynamo_input.h5ad') dyn.pp.recipe_monocle(adata) dyn.tl.dynamics(adata, cores=48) dyn.tl.reduceDimension(adata) adata.write_h5ad("./dysplasia_adult_dynamo_fisrst_processed.h5ad")

TypeError Traceback (most recent call last) ~/.local/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, *kwargs) 213 try: --> 214 return func(elem, key, val, args, **kwargs) 215 except Exception as e:

~/.local/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, *kwargs) 170 ): --> 171 _REGISTRY.get_writer(dest_type, (t, elem.dtype.kind), modifiers)( 172 f, k, elem, args, **kwargs

~/.local/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, kwargs) 23 def wrapper(g, k, *args, *kwargs): ---> 24 result = func(g, k, args, kwargs) 25 g[k].attrs.setdefault("encoding-type", spec.encoding_type)

~/.local/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_vlen_string_array(f, k, elem, dataset_kwargs) 345 str_dtype = h5py.special_dtype(vlen=str) --> 346 f.create_dataset(k, data=elem.astype(str_dtype), dtype=str_dtype, **dataset_kwargs) 347

~/.local/lib/python3.8/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, kwds) 182 --> 183 dsid = dataset.make_new_dset(group, shape, dtype, data, name, kwds) 184 dset = dataset.Dataset(dsid)

~/.local/lib/python3.8/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, dapl, efile_prefix, virtual_prefix, allow_unknown_filter, rdcc_nslots, rdcc_nbytes, rdcc_w0) 167 if (data is not None) and (not isinstance(data, Empty)): --> 168 dset_id.write(h5s.ALL, h5s.ALL, data) 169

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5d.pyx in h5py.h5d.DatasetID.write()

h5py/_proxy.pyx in h5py._proxy.dset_rw()

h5py/_conv.pyx in h5py._conv.str2vlen()

h5py/_conv.pyx in h5py._conv.generic_converter()

h5py/_conv.pyx in h5py._conv.conv_str2vlen()

TypeError: Can't implicitly convert non-string objects to strings

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)

in ----> 1 adata.write_h5ad("./dysplasia_adult_dynamo_processed_final_2.h5ad") ~/.local/lib/python3.8/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense) 1916 filename = self.filename 1917 -> 1918 _write_h5ad( 1919 Path(filename), 1920 self, ~/.local/lib/python3.8/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs) 97 write_elem(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs) 98 write_elem(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs) ---> 99 write_elem(f, "var", adata.var, dataset_kwargs=dataset_kwargs) 100 write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs) 101 write_elem(f, "varm", dict(adata.varm), dataset_kwargs=dataset_kwargs) ~/.local/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 212 def func_wrapper(elem, key, val, *args, **kwargs): 213 try: --> 214 return func(elem, key, val, *args, **kwargs) 215 except Exception as e: 216 if "Above error raised while writing key" in format(e): ~/.local/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs) 173 ) 174 else: --> 175 _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs) 176 177 ~/.local/lib/python3.8/site-packages/anndata/_io/specs/registry.py in wrapper(g, k, *args, **kwargs) 22 @wraps(func) 23 def wrapper(g, k, *args, **kwargs): ---> 24 result = func(g, k, *args, **kwargs) 25 g[k].attrs.setdefault("encoding-type", spec.encoding_type) 26 g[k].attrs.setdefault("encoding-version", spec.encoding_version) ~/.local/lib/python3.8/site-packages/anndata/_io/specs/methods.py in write_dataframe(f, key, df, dataset_kwargs) 512 for colname, series in df.items(): 513 # TODO: this should write the "true" representation of the series (i.e. the underlying array or ndarray depending) --> 514 write_elem(group, colname, series._values, dataset_kwargs=dataset_kwargs) 515 516 ~/.local/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs) 218 else: 219 parent = _get_parent(elem) --> 220 raise type(e)( 221 f"{e}\n\n" 222 f"Above error raised while writing key {key!r} of {type(elem)} " TypeError: Can't implicitly convert non-string objects to strings Above error raised while writing key 'alpha_b' of to /
Xiaojieqiu commented 1 year ago

please delete all your installed dynamo versions and install the latest dynamo version from github

saving is a known problems, please search previous issues and we will fix this completely in a couple of weeks

Mingsenli commented 1 year ago

I created a new python 3.7 conda enviroment and installed the latest dynamo version from github (1.2.0). However, dyn.pd.perturbation still fail to run and reporting the same error.

Xiaojieqiu commented 1 year ago

did you make sure you are able to run on our sample dataset, using my code above? If using our example code/data worked but you still cannot run on your own data, feel free to send me some example data so that I can debug for you.

Thanks!

Mingsenli commented 1 year ago

My Dynamo (based on both python 3.8 and 3.7) fails to run on sample dataset downloaded by dyn.sample_data.pancreatic_endocrinogenesis(), using your code above.

So my Dynomo software has a problem. I install the pandas 1.3.5 or 1.5.3, then dynamo works. My numpy is 1.20.3. All commands except for the dyn.pd.perturbation can be run successfully. I guess whether the failure of dyn.pd.perturbation is caused by the version of pandas or numpy or ther softwares? The in silico perturbation prediction is very important to my project, I really hope to solve this problem.

The default pandas version is 2.0.1, but my dynamo could not be imported under pandas 2.0.1. The error is :

ImportError Traceback (most recent call last) File ~/.local/lib/python3.8/site-packages/statsmodels/compat/pandas.py:61 60 try: ---> 61 from pandas import NumericIndex 63 has_numeric_index = True

ImportError: cannot import name 'NumericIndex' from 'pandas' (/data/User/limingsen/.conda/envs/dynamo2/lib/python3.8/site-packages/pandas/init.py)

ImportError: cannot import name 'Int64Index' from 'pandas' (/data/User/limingsen/.conda/envs/dynamo2/lib/python3.8/site-packages/pandas/init.py)

Xiaojieqiu commented 1 year ago

@Ukyeon @Sichao25 I need to wrap up the papers. can you please help @Mingsenli on this. you may tell him how to install pandas version correctly. Dynamo has a function to identify all the package version and you can port all relevant packages version in your computer and share this with @Mingsenli

Thanks guys!

Sichao25 commented 1 year ago

Hi @Mingsenli, thank you for reporting this issue. Are you using the released version of Dynamo (from pip install) or the GitHub version? If you are using the GitHub version, this may relate to one known bug which is waiting for the merge. Switching to the released version may help you solve the problem.

Also, can you print the adata.uns["pca_mean"] to check if the value is None or not?

Xiaojieqiu commented 1 year ago

@Sichao25 this bug should have been fixed in our latest merge, right?

Sichao25 commented 1 year ago

Yes, it has been fixed now. Now the pca_mean should be saved correctly.

Mingsenli commented 1 year ago

@Sichao25 @Xiaojieqiu I tried the released version of Dynamo (1.2.0) both from pip install and the GitHub. two methods:

  1. pip install dynamo-release
  2. git clone https://github.com/aristoteleo/dynamo-release.git pip install dynamo-release/ --user

However, the value of adata.uns["pca_mean"] is none.

Xiaojieqiu commented 1 year ago

Hi @Mingsenli it looks like one of the X, PCs or means as shown in this line of code X @ PCs.T + mean is None. So you can check the following things:

  1. is the "RORA" a gene used for pca dimension reduction (check whether adata[:, 'RORA'].var['use_for_pca'] is True. This is because the vector field will only be built from the genes used for creating the pca embeding. If this gene is not detected as feature genes for pca dimension reduction, please append this gene and other genes of interests when running dyn.pp.receipe_monocle with the argument genes_to_append.
  2. check whether your adata has PCs. that is adata.uns['PCs'] exists
  3. check whether your adata has pca_mean, that is adata.uns['pca_mean'].

Hope this helps and please let me know how everything goes!

Hi @Mingsenli, that was what I was asking from the beginning. The error says the one of pcs or pca_means are None. but you were saying you have the data... As we mentioned above, the bug has fixed now. See this commit: https://github.com/aristoteleo/dynamo-release/commit/2b9f302a2bfec119a0f883f072c2e7a43f08a734 Pull the latest version from github and reinstall it.

Sichao25 commented 1 year ago

For your information, I use statsmodel 0.13.5, pandas 1.5.3, numpy 1.23.5. I am able to get the result on the pancreatic_endocrinogenesis dataset with the latest GitHub version. I also try the released version with google colab it also works. If you have trouble with your local dependency setting, colab can be an alternative to test the code first.

Mingsenli commented 1 year ago

@Xiaojieqiu @Sichao25 I am so sorry for the misunderstand of "pca_mean". I saw adata.uns having "pca_mean" but i did not print it. Then, I naturally assumed that it had a value.

Now, dynamo-release-master pulled from github(https://codeload.github.com/aristoteleo/dynamo-release/zip/refs/heads/master) can work well.

The in silico perturbation predictions of my interesting genes in my human single-cell data are perfect and the predictions were validaed by in vitro experiments. In addition, the results of RNA velocities, RNA acceleration, RNA Jacobian and ddhodge_potential were so cool in my data. Thank you very much for this excellent tool. Thank you for your patience and timely resolution of the problem!

The dynamo-release installed by "pip install dynamo-release" or "git clone https://github.com/aristoteleo/dynamo-release.git/pip install dynamo-release/ --user" did not save pca_mean. The dynamo-release downloaded from https://github.com/aristoteleo/dynamo-release/releases (v1.2.0, Source code (zip)) also fail to save pca_mean.

I also found that only pandas 1.5.3 and numpy 1.20.3 can work in my computer. I hope these experiences can help other people.

Sichao25 commented 1 year ago

Thank you for reporting those issues! Early versions may set the pca_mean to None under some cases (e.g. large dataset). The next dynamo version will be released soon. At that time, dynamo installed from any source should work well. Before that, feel free to use the GitHub version for your research and project.

Xiaojieqiu commented 1 year ago

@Xiaojieqiu @Sichao25 I am so sorry for the misunderstand of "pca_mean". I saw adata.uns having "pca_mean" but i did not print it. Then, I naturally assumed that it had a value.

Now, dynamo-release-master pulled from github(https://codeload.github.com/aristoteleo/dynamo-release/zip/refs/heads/master) can work well.

The in silico perturbation predictions of my interesting genes in my human single-cell data are perfect and the predictions were validaed by in vitro experiments. In addition, the results of RNA velocities, RNA acceleration, RNA Jacobian and ddhodge_potential were so cool in my data. Thank you very much for this excellent tool. Thank you for your patience and timely resolution of the problem!

The dynamo-release installed by "pip install dynamo-release" or "git clone https://github.com/aristoteleo/dynamo-release.git/pip install dynamo-release/ --user" did not save pca_mean. The dynamo-release downloaded from https://github.com/aristoteleo/dynamo-release/releases (v1.2.0, Source code (zip)) also fail to save pca_mean.

I also found that only pandas 1.5.3 and numpy 1.20.3 can work in my computer. I hope these experiences can help other people.

I am so glad to hear that dynamo is instrumental in offering you deep predictive insights that can be well validated! Good luck to your paper submission and please feel free to recommend our tools to others, etc. We are happy to address future questions as well

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 14 days