aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
418 stars 59 forks source link

Some questions about the theory. #277

Closed xiaozhongshen closed 2 years ago

xiaozhongshen commented 2 years ago

Hello! I have read through the notebooks on the website and I still had some questions about the theory mainly about the concepts. The notes I wanted to quote are: "We can see that:

from cell speed and acceleration, progenitors generally have low speed as it is like a metastable cell state. However transition of pigment progenitors and proliferating progenitors speeds up after committing to a particular lineage, for example, iridophore/melanophore/shawnn cell lineage, etc.

from cell divergence, those progenitors (pigment progenitors and proliferating progenitors) functions like a source with high divergence while melanophore/iridophores/chromaffin/schawn cells as well as other cell types functions like a sink with significantly lower divergence.

from cell curvature, when cell makes cell fate decisions (at the bifurcation point of iridophore and melanophore lineages or that of the neuron and satellite glia lineages), strong curvature is apparent. Curvature is also artificially strong when velocity is noisy."

The questions are:

  1. From the speed result plot, I found that the cells are at high speed when they are differentiating and the cells enter into a low-speed state when they end in differentiating. Is that right?
  2. What is the biological meaning of divergence? I found the conclusion of divergence result is similar to that of potential landscape analysis result.
  3. I want to know the realtionship between the acceleration result and the curvature result. It seems the two results are all related to cell fate decision, but the visualization of the two results are very different.

Thanks!

Xiaojieqiu commented 2 years ago

@xiaozhongshen thanks for your questions! Let me provide my answer to your questions below:

From the speed result plot, I found that the cells are at high speed when they are differentiating and the cells enter into a low-speed state when they end in differentiating. Is that right? That is right. When cells leave a progenitor cell, the RNA acceleration will first increase and then RNA speed goes up, cells will then move toward to the terminal cell types. Once cells get close to the destination, the RNA accleration will be smaller (and even negative), and then RNA speed decreases and approaches zero once cells settle into the terminal cell states.

What is the biological meaning of divergence? I found the conclusion of divergence result is similar to that of potential landscape analysis result. Divergence corresponds to the local outgoingness of the flow. So it can be related to the "pluripotency" or "stability". Positive divergence relates to a source (because local flows move out) while negative relates to a sink (because local flows absorb). Cells at the progenitor states tend to have large positive divergence and thus correspond to sources while cell at terminimal state have negative divergence and thus correspond to sinks.

I want to know the realtionship between the acceleration result and the curvature result. It seems the two results are all related to cell fate decision, but the visualization of the two results are very different.

You may find the box 1 in our cell paper useful. we visualized and explained the relationship between acceleration and curvature. Figure 1 of the cell paper will also help you understand the difference between them. Basically, acceleration is the derivative of RNA velocity while curvature is the orthogonal projection of the acceleration (and measures the changes in direction).

You may also find the RNA Jacobian very interesting and useful -- which reflects the state-dependent gene interactions and accurately reveal gene regulations and even the hill coefficiency (see supplementary figure 6 in the cell paper). Please also check our new optimal paths and in silico perturbation approaches. Those innovations really made dynamo a predictive tool instead of mostly a descriptive tool like other RNA velocity analyses toolkits.

Let me know whether these makes sense to you and happy to help further

xiaozhongshen commented 2 years ago

Thanks for your answer! I think this tool is really helpful and interesting when I used it in datasets related to embryo development. To the question 3 just before, I found that curvature changes before the changes of acceleration. I thought curvature stands for the level of cell fate decision before cell differentiation and acceleration stands for cell differentiation, right? For example, if a cell type has low divergence but high curvature,speed and acceleration, what's the property of this cell type?

Xiaojieqiu commented 2 years ago

Cell tends to have high curvature if it tries to move toward a direction that different from its current path. So cells around bifurcation point or saddle will have high curvatures. But curvature and acceleration are tightly related and should correlate well simply because they are mathematically related (See Box 1).

if a cell type has low divergence but high curvature,speed and acceleration, what's the property of this cell type? in this case, the cell type may correspond to an intermediate cell state or cells at bifurcation points.

I want to also suggest that while divergence is a scalar, curvature, speed and accelerations are all gene x cell matrices. you should also be able to identify the actual genes that have highest curvature / speed and acceleration in a particular cell or cell type. see more details in the zebrafish tutorial on how to ranking those quantities for each gene.

xiaozhongshen commented 2 years ago

@Xiaojieqiu Thanks! I took your suggestion and I found the result is really interesting.

  1. I found top-ranked acceleration genes are greatly different from curvature genes or speed genes. I also have a questions from the results: what's the biological meaning of top-ranked acceleration genes, curvature genes or speed genes? How can we understand and distinguish their differences? For example, can I understand top-ranked acceleration genes as significant genes in cell differentiation?
  2. I also runned the step of in silico perturbation. How can I understand the number 100 (or -100) in the code? And I also want to learn how to understand the result, because it seems like the directions of many cell types will change a lot if a gene is activated or deactivated.
Xiaojieqiu commented 2 years ago

@xiaozhongshen That is great! glad it is helpful! Regarding to your questions:

  1. the meaning of acceleration, curvature and speed is the same as we discussed before. The top genes are just genes has the highest acceleration, curvature or speed, etc. For example, top acceleration genes means that genes in this cell group or across all cells have highest acceleration and may correspond to driver genes that will turn on in future. Please note that you can either use the raw value for ranking or just the absolute value (ignore sign)
  2. 100 or -100 indicates gene activation or suppression, respectively. In theory, our in silico perturbation should only allow minor perturbation because we rely on Jacobian matrix which is defined as a local identity. However, you can use a large value and treat it as like we added a scalar to magnify the perturbations. If a small value is used, you may not be able to visually see the cell fate changes. Please also note that you can access the perturbation response

You can also do differential analyses for acceleration, speed and curvature too (Just like DEG analyses). Also, strongly suggest you check RNA Jacobian to identify key regulators/effectors/interactions, either do the ranking analyses or plot the Jacobian values between genes across cells

xiaozhongshen commented 2 years ago

@Xiaojieqiu Thanks! I followed your suggestion and I am running the process of most probable path predictions. I also have some questions. 1) ("We select the five closest cells of the identified attractors that correspond to each of the six cell types to represent the typical cell state of these cells (note that attractors often don’t correspond to any particular cell"). I want to know how to identify the attractors and choose them. If the start cell types at the begining are two, how to change the codes in my datasets, like "develope_keys = ["HSC->Meg", "HSC->Ery", "HSC->Bas", "HSC->Mon", "HSC->Neu"] reprogram_keys = ["Meg->HSC", "Ery->HSC", "Bas->HSC", "Mon->HSC", "Neu->HSC"]" in the website? 2) It is difficult for me to understand the process of build transition graph between cell states. Can I change the code based on umap not pca? but in both situations, I found only a few directions are reversed compared with the results of velocity. and I also want to understand this step if compared with the process of most probable path predictions. 3) I also want to understand the meaning of in silico perturbation. These directions stand for the functions (such as promoting or restraining) or their real differentiating directions with activating or deactivating genes(reprogramming)?

Xiaojieqiu commented 2 years ago

Hi @xiaozhongshen, I am sorry but I had a difficult time to understand your questiosn. Can you please improve the English and clarity of the questions? Then I can give you more meaningful answer.

Here are my response after guessing your questions:

  1. about attractor: to identify the attractors you will need to use the dyn.vf.topography(adata). then you can visualize the attractors via dyn.pl.topography(adata). then in the adata.uns['VecFld_umap'] you can find Xss array which keeps the coordinates of all attractors. Often you need to clean up the attractors because you may end up a lot of attractors because of numerical instability and data noise. about two starting point: most probable path connects a source to a target. if you want to find the optimal paths from two start points, just set two different starting points but the same target.
  2. sorry this question doesn't make sense to me. please update it
  3. Those streamlines are integration paths of the predicted perturbation vectors projected to low dimensional space. Please read the documentation of dyn.pd.perturbation and our method section on this in the cell paper for more details
xiaozhongshen commented 2 years ago

Thanks @Xiaojieqiu

  1. When I ran the process of dyn.pl.state_graph, I found some colors of the directions were very light and similar to white. How can I make these directions clearer to visualize?
  2. " Often you need to clean up the attractors because you may end up a lot of attractors because of numerical instability and data noise." I want to know how can I evaluate and identify the noise of the attractors and how can I remove them in the array.
  3. while I ran the step "for i, start in enumerate(start_cell_indices): for j, end in enumerate(end_cell_indices): if start is not end: min_lap_t = True if i == 0 else False dyn.pd.least_action( adata_labeling, [adata_labeling.obs_names[start[0]][0]], [adata_labeling.obs_names[end[0]][0]], basis="umap", adj_key="X_umap_distances", min_lap_t= min_lap_t, EM_steps=2, ) dyn.pl.least_action(adata_labeling, basis="umap") lap = dyn.pd.least_action( adata_labeling, [adata_labeling.obs_names[start[0]][0]], [adata_labeling.obs_names[end[0]][0]], basis="pca", adj_key="cosine_transition_matrix", min_lap_t=min_lap_t, EM_steps=2, ) dyn.pl.kinetic_heatmap( adata_labeling, basis="pca", mode="lap", genes=adata_labeling.var_names[adata_labeling.var.use_for_transition], project_back_to_high_dim=True, )

    The GeneTrajectory class can be used to output trajectories for any set of genes of interest

        gtraj = dyn.pd.GeneTrajectory(adata_labeling)
        gtraj.from_pca(lap.X, t=lap.t)
        gtraj.calc_msd()
        ranking = dyn.vf.rank_genes(adata_labeling, "traj_msd")
    
        print(start, "->", end)
        genes = ranking[:5]["all"].to_list()
        arr = gtraj.select_gene(genes)
    
        dyn.pl.multiplot(lambda k: [plt.plot(arr[k, :]), plt.title(genes[k])], np.arange(len(genes)))
    
        transition_graph[cell_type[i] + "->" + cell_type[j]] = {
            "lap": lap,
            "LAP_umap": adata_labeling.uns["LAP_umap"],
            "LAP_pca": adata_labeling.uns["LAP_pca"],
            "ranking": ranking,
            "gtraj": gtraj,
        }"
    I had a mistake:Traceback (most recent call last):

    File "", line 5, in File "/DATA/sxz/data/anaconda3/envs/dynamo-env/lib/python3.9/site-packages/dynamo/prediction/least_action_path.py", line 136, in least_action T = adata.obsp[adj_key] File "/DATA/sxz/data/anaconda3/envs/dynamo-env/lib/python3.9/site-packages/anndata/_core/aligned_mapping.py", line 148, in getitem return self._data[key] KeyError: 'X_umap_distances'

Xiaojieqiu commented 2 years ago

@xiaozhongshen

re: When I ran the process of dyn.pl.state_graph, I found some colors of the directions were very light and similar to white. How can I make these directions clearer to visualize?

You can set the graph_alpha to be 1 to increase the visibility of the edges. also try tune the edgecolor, and edge_scale. Generally you should read the documentation of each function which often solves many of the similar questions.

re: " Often you need to clean up the attractors because you may end up a lot of attractors because of numerical instability and data noise." I want to know how can I evaluate and identify the noise of the attractors and how can I remove them in the array.

A rule of thumb is that you will need to find an attractor for each cell state. Also try to use correct_density = False when you run tl.cell_velocities and the vector field learning. This often reduces the number of the attractors identified and makes the attractor more meaningful. Please note that you may still need to use correct_density = True when you plot the streamline because you will otherwise find the flow attracts to the middle of high cell density.

Regarding how to remove them, see my previous answers on the Xss array. You just delete certains using numpy operations.

We will have a few more tutorials and we will show how the attractors are identified in our HSC dataset.

Since I am busy with may other commitments and tasks, for the last question, @dummyindex will help you with it

dummyindex commented 2 years ago

@Xiaojieqiu Thanks! I followed your suggestion and I am running the process of most probable path predictions. I also have some questions.

  1. ("We select the five closest cells of the identified attractors that correspond to each of the six cell types to represent the typical cell state of these cells (note that attractors often don’t correspond to any particular cell"). I want to know how to identify the attractors and choose them. If the start cell types at the begining are two, how to change the codes in my datasets, like "develope_keys = ["HSC->Meg", "HSC->Ery", "HSC->Bas", "HSC->Mon", "HSC->Neu"] reprogram_keys = ["Meg->HSC", "Ery->HSC", "Bas->HSC", "Mon->HSC", "Neu->HSC"]" in the website?
  2. It is difficult for me to understand the process of build transition graph between cell states. Can I change the code based on umap not pca? but in both situations, I found only a few directions are reversed compared with the results of velocity. and I also want to understand this step if compared with the process of most probable path predictions.
  3. I also want to understand the meaning of in silico perturbation. These directions stand for the functions (such as promoting or restraining) or their real differentiating directions with activating or deactivating genes(reprogramming)?

Hi @xiaozhongshen ,

May I take a look at your adata object when running this step? If you download the dyn.sample_data.hematopoiesis() dataset ~2 weeks ago, you may need to download the newest version (delete the one in ./data folder) because we updated the dataset regarding umap keys. The notebook is up-to-date with the newest dataset version. Meanwhile, the computing LAP step typically takes 30-60 minutes in this notebook on a personal computer and please be patient. Please let us know if this solved your problem.

xiaozhongshen commented 2 years ago

@dummyindex Thanks! But when I want to install the newest version, I had a mistake:

pip install dynamo-release/ --user Processing ./dynamo-release Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install backend dependencies did not run successfully. │ exit code: 1 ╰─> [268 lines of output] Ignoring hiveplotlib: markers 'extra == "network"' don't match your environment Ignoring holoviews: markers 'extra == "bigdata_visualization"' don't match your environment Ignoring wurlitzer: markers 'extra == "network"' don't match your environment Ignoring pytest: markers 'extra == "dev"' don't match your environment Ignoring mock: markers 'extra == "docs"' don't match your environment Ignoring sphinx_autodoc_typehints: markers 'extra == "docs"' don't match your environment Ignoring python-igraph: markers 'extra == "network"' don't match your environment Ignoring sphinx-gallery: markers 'extra == "docs"' don't match your environment Ignoring bokeh: markers 'extra == "bigdata_visualization"' don't match your environment Ignoring sympy: markers 'extra == "test"' don't match your environment Ignoring leidenalg: markers 'extra == "network"' don't match your environment Ignoring GitPython: markers 'extra == "docs"' don't match your environment Ignoring nxviz: markers 'extra == "network"' don't match your environment Ignoring fitsne: markers 'extra == "dimension_reduction"' don't match your environment Ignoring readthedocs-sphinx-ext: markers 'extra == "docs"' don't match your environment Ignoring pysal: markers 'extra == "spatial"' don't match your environment Ignoring datashader: markers 'extra == "bigdata_visualization"' don't match your environment Ignoring nbsphinx: markers 'extra == "docs"' don't match your environment Ignoring sphinx-rtd-theme: markers 'extra == "docs"' don't match your environment Ignoring sphinxcontrib-bibtex: markers 'extra == "docs"' don't match your environment Ignoring networkx: markers 'extra == "network"' don't match your environment Ignoring sphinx: markers 'extra == "docs"' don't match your environment Ignoring pytest: markers 'extra == "test"' don't match your environment Ignoring setuptools: markers 'extra == "docs"' don't match your environment Collecting networkx Using cached networkx-2.7-py3-none-any.whl (2.0 MB) Collecting tqdm Using cached tqdm-4.63.0-py2.py3-none-any.whl (76 kB) Collecting numpy>=1.18.1 Using cached numpy-1.22.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB) Collecting joblib Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB) Collecting statsmodels>=0.9.0 Using cached statsmodels-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB) Collecting colorcet>=2.0.1 Using cached colorcet-3.0.0-py2.py3-none-any.whl (1.6 MB) Collecting matplotlib>=3.4.1 Using cached matplotlib-3.5.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB) Collecting seaborn>=0.9.0 Using cached seaborn-0.11.2-py3-none-any.whl (292 kB) Collecting python-igraph>=0.7.1 Using cached python_igraph-0.9.9-py3-none-any.whl (9.1 kB) Collecting KDEpy Using cached KDEpy-1.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (443 kB) Collecting PATSY>=0.5.1 Using cached patsy-0.5.2-py2.py3-none-any.whl (233 kB) Collecting scipy>=1.0 Using cached scipy-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB) Collecting umap-learn>=0.5.1 Using cached umap_learn-0.5.2-py3-none-any.whl Collecting loompy>=3.0.5 Using cached loompy-3.0.6-py3-none-any.whl Collecting numba>=0.46.0 Using cached numba-0.55.1-1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.3 MB) Collecting cvxopt>=1.2.3 Using cached cvxopt-1.2.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB) Collecting pynndescent>=0.5.2 Using cached pynndescent-0.5.6-py3-none-any.whl Collecting trimap>=1.0.11 Using cached trimap-1.1.2-py3-none-any.whl (15 kB) Collecting pandas>=0.25.1 Using cached pandas-1.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB) Collecting numdifftools>=0.9.39 Using cached numdifftools-0.9.40-py2.py3-none-any.whl (99 kB) Collecting anndata==0.7.5 Using cached anndata-0.7.5-py3-none-any.whl (119 kB) Collecting nxviz==0.7.3 Using cached nxviz-0.7.3-py3-none-any.whl (28 kB) Collecting scikit-learn>=0.19.1 Using cached scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB) Collecting setuptools Using cached setuptools-60.9.3-py3-none-any.whl (1.1 MB) Collecting h5py Using cached h5py-3.6.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB) Collecting packaging Using cached packaging-21.3-py3-none-any.whl (40 kB) Collecting natsort Using cached natsort-8.1.0-py3-none-any.whl (37 kB) Collecting palettable>=3.3.0 Using cached palettable-3.3.0-py2.py3-none-any.whl (111 kB) Collecting mkdocs-material Using cached mkdocs_material-8.2.3-py2.py3-none-any.whl (4.8 MB) Collecting mknotebooks Using cached mknotebooks-0.7.1-py3-none-any.whl (13 kB) Collecting more-itertools>=8.6.0 Using cached more_itertools-8.12.0-py3-none-any.whl (54 kB) Collecting mkdocs Using cached mkdocs-1.2.3-py3-none-any.whl (6.4 MB) Collecting param>=1.7.0 Using cached param-1.12.0-py2.py3-none-any.whl (85 kB) Collecting pyct>=0.4.4 Using cached pyct-0.4.8-py2.py3-none-any.whl (15 kB) Collecting python-dateutil>=2.7 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting pillow>=6.2.0 Using cached Pillow-9.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB) Collecting cycler>=0.10 Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB) Collecting pyparsing>=2.2.1 Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB) Collecting kiwisolver>=1.0.1 Using cached kiwisolver-1.3.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB) Collecting fonttools>=4.22.0 Using cached fonttools-4.29.1-py3-none-any.whl (895 kB) Collecting igraph==0.9.9 Using cached igraph-0.9.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) Collecting texttable>=1.6.2 Using cached texttable-1.6.4-py2.py3-none-any.whl (10 kB) Collecting six Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting numpy-groupies Using cached numpy_groupies-0.9.14-py3-none-any.whl Collecting click Using cached click-8.0.4-py3-none-any.whl (97 kB) Collecting llvmlite<0.39,>=0.38.0rc1 Using cached llvmlite-0.38.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB) Collecting numpy>=1.18.1 Using cached numpy-1.21.5-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB) Collecting annoy>=1.11 Using cached annoy-1.17.0.tar.gz (646 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'done' Collecting pytz>=2020.1 Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB) Collecting algopy>=0.4 Using cached algopy-0.5.7-py3-none-any.whl Collecting threadpoolctl>=2.0.0 Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB) Collecting PyYAML>=3.10 Using cached PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB) Collecting ghp-import>=1.0 Using cached ghp_import-2.0.2-py3-none-any.whl (11 kB) Collecting watchdog>=2.0 Using cached watchdog-2.1.6-py3-none-manylinux2014_x86_64.whl (76 kB) Collecting Jinja2>=2.10.1 Using cached Jinja2-3.0.3-py3-none-any.whl (133 kB) Collecting mergedeep>=1.3.4 Using cached mergedeep-1.3.4-py3-none-any.whl (6.4 kB) Collecting importlib-metadata>=3.10 Using cached importlib_metadata-4.11.2-py3-none-any.whl (17 kB) Collecting pyyaml-env-tag>=0.1 Using cached pyyaml_env_tag-0.1-py3-none-any.whl (3.9 kB) Collecting Markdown>=3.2.1 Using cached Markdown-3.3.6-py3-none-any.whl (97 kB) Collecting pymdown-extensions>=9.0 Using cached pymdown_extensions-9.2-py3-none-any.whl (216 kB) Collecting mkdocs-material-extensions>=1.0 Using cached mkdocs_material_extensions-1.0.3-py3-none-any.whl (8.1 kB) Collecting pygments>=2.10 Using cached Pygments-2.11.2-py3-none-any.whl (1.1 MB) Collecting nbconvert>=6.0.0 Using cached nbconvert-6.4.2-py3-none-any.whl (558 kB) Collecting gitpython Using cached GitPython-3.1.27-py3-none-any.whl (181 kB) Collecting jupyter-client Using cached jupyter_client-7.1.2-py3-none-any.whl (130 kB) Collecting zipp>=0.5 Using cached zipp-3.7.0-py3-none-any.whl (5.3 kB) Collecting MarkupSafe>=2.0 Using cached MarkupSafe-2.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting nbformat>=4.4 Using cached nbformat-5.1.3-py3-none-any.whl (178 kB) Collecting entrypoints>=0.2.2 Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB) Collecting jupyterlab-pygments Using cached jupyterlab_pygments-0.1.2-py2.py3-none-any.whl (4.6 kB) Collecting bleach Using cached bleach-4.1.0-py2.py3-none-any.whl (157 kB) Collecting pandocfilters>=1.4.1 Using cached pandocfilters-1.5.0-py2.py3-none-any.whl (8.7 kB) Collecting testpath Using cached testpath-0.6.0-py3-none-any.whl (83 kB) Collecting mistune<2,>=0.8.1 Using cached mistune-0.8.4-py2.py3-none-any.whl (16 kB) Collecting traitlets>=5.0 Using cached traitlets-5.1.1-py3-none-any.whl (102 kB) Collecting nbclient<0.6.0,>=0.5.0 Using cached nbclient-0.5.11-py3-none-any.whl (71 kB) Collecting jupyter-core Using cached jupyter_core-4.9.2-py3-none-any.whl (86 kB) Collecting defusedxml Using cached defusedxml-0.7.1-py2.py3-none-any.whl (25 kB) Collecting gitdb<5,>=4.0.1 Using cached gitdb-4.0.9-py3-none-any.whl (63 kB) Collecting nest-asyncio>=1.5 Using cached nest_asyncio-1.5.4-py3-none-any.whl (5.1 kB) Collecting tornado>=4.1 Using cached tornado-6.1-cp39-cp39-manylinux2010_x86_64.whl (427 kB) Collecting pyzmq>=13 Using cached pyzmq-22.3.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB) Collecting smmap<6,>=3.0.1 Using cached smmap-5.0.0-py3-none-any.whl (24 kB) Collecting jsonschema!=2.5.0,>=2.4 Using cached jsonschema-4.4.0-py3-none-any.whl (72 kB) Collecting ipython-genutils Using cached ipython_genutils-0.2.0-py2.py3-none-any.whl (26 kB) Collecting webencodings Using cached webencodings-0.5.1-py2.py3-none-any.whl (11 kB) Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 Using cached pyrsistent-0.18.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115 kB) Collecting attrs>=17.4.0 Using cached attrs-21.4.0-py2.py3-none-any.whl (60 kB) Building wheels for collected packages: annoy Building wheel for annoy (setup.py): started Building wheel for annoy (setup.py): finished with status 'error' error: subprocess-exited-with-error

    × python setup.py bdist_wheel did not run successfully.
    │ exit code: 1
    ╰─> [16 lines of output]
        /DATA/sxz/data/anaconda3/envs/dynamoenv/lib/python3.9/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
          warnings.warn(
        running bdist_wheel
        running build
        running build_py
        creating build
        creating build/lib.linux-x86_64-3.9
        creating build/lib.linux-x86_64-3.9/annoy
        copying annoy/__init__.py -> build/lib.linux-x86_64-3.9/annoy
        running build_ext
        building 'annoy.annoylib' extension
        creating build/temp.linux-x86_64-3.9
        creating build/temp.linux-x86_64-3.9/src
        gcc -pthread -B /DATA/sxz/data/anaconda3/envs/dynamoenv/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /DATA/sxz/data/anaconda3/envs/dynamoenv/include -fPIC -O2 -isystem /DATA/sxz/data/anaconda3/envs/dynamoenv/include -fPIC -I/DATA/sxz/data/anaconda3/envs/dynamoenv/include/python3.9 -c src/annoymodule.cc -o build/temp.linux-x86_64-3.9/src/annoymodule.o -D_CRT_SECURE_NO_WARNINGS -march=native -O3 -ffast-math -fno-associative-math -DANNOYLIB_MULTITHREADED_BUILD -std=c++14
        gcc: error: unrecognized command line option ‘-std=c++14’
        error: command '/usr/bin/gcc' failed with exit code 1
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
    ERROR: Failed building wheel for annoy
    Running setup.py clean for annoy
  Failed to build annoy
  Installing collected packages: webencodings, texttable, pytz, palettable, numpy-groupies, mistune, ipython-genutils, annoy, algopy, zipp, watchdog, traitlets, tqdm, tornado, threadpoolctl, testpath, smmap, six, setuptools, pyzmq, PyYAML, pyrsistent, pyparsing, pygments, pillow, param, pandocfilters, numpy, networkx, nest-asyncio, natsort, more-itertools, mkdocs-material-extensions, mergedeep, MarkupSafe, llvmlite, kiwisolver, joblib, igraph, fonttools, entrypoints, defusedxml, cycler, cvxopt, click, attrs, scipy, pyyaml-env-tag, python-igraph, python-dateutil, pyct, PATSY, packaging, numba, jupyterlab-pygments, jupyter-core, jsonschema, Jinja2, importlib-metadata, h5py, gitdb, scikit-learn, pandas, nbformat, matplotlib, Markdown, loompy, jupyter-client, gitpython, ghp-import, colorcet, bleach, trimap, statsmodels, seaborn, pynndescent, pymdown-extensions, nbclient, mkdocs, KDEpy, anndata, umap-learn, numdifftools, nbconvert, mkdocs-material, mknotebooks, nxviz
    Running setup.py install for annoy: started
    Running setup.py install for annoy: finished with status 'error'
    error: subprocess-exited-with-error

    × Running setup.py install for annoy did not run successfully.
    │ exit code: 1
    ╰─> [18 lines of output]
        /DATA/sxz/data/anaconda3/envs/dynamoenv/lib/python3.9/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
          warnings.warn(
        running install
        /DATA/sxz/data/anaconda3/envs/dynamoenv/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
          warnings.warn(
        running build
        running build_py
        creating build
        creating build/lib.linux-x86_64-3.9
        creating build/lib.linux-x86_64-3.9/annoy
        copying annoy/__init__.py -> build/lib.linux-x86_64-3.9/annoy
        running build_ext
        building 'annoy.annoylib' extension
        creating build/temp.linux-x86_64-3.9
        creating build/temp.linux-x86_64-3.9/src
        gcc -pthread -B /DATA/sxz/data/anaconda3/envs/dynamoenv/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /DATA/sxz/data/anaconda3/envs/dynamoenv/include -fPIC -O2 -isystem /DATA/sxz/data/anaconda3/envs/dynamoenv/include -fPIC -I/DATA/sxz/data/anaconda3/envs/dynamoenv/include/python3.9 -c src/annoymodule.cc -o build/temp.linux-x86_64-3.9/src/annoymodule.o -D_CRT_SECURE_NO_WARNINGS -march=native -O3 -ffast-math -fno-associative-math -DANNOYLIB_MULTITHREADED_BUILD -std=c++14
        gcc: error: unrecognized command line option ‘-std=c++14’
        error: command '/usr/bin/gcc' failed with exit code 1
        [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: legacy-install-failure

  × Encountered error while trying to install package.
  ╰─> annoy

  note: This is an issue with the package mentioned above, not pip.
  hint: See above for output from the failure.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× pip subprocess to install backend dependencies did not run successfully. │ exit code: 1 ╰─> See above for output.

dummyindex commented 2 years ago

Hi @xiaozhongshen,

May I have your operating system information? From the output, the package annoy installation process is the culprit. It says probably you do not have gcc installed in your environment/OS. I met similar issue before when installing hdbscan and annoy and checked my notes.

Can you try

conda install -c conda-forge python-devtools
conda install -c conda-forge hdbscan

And then install dynamo latest version again? If you use Mac, you may want to install Xcode as well to have gcc. Our team will discuss regarding these installation issues and may make dependency package optional which causing installation frequently in the future.

xiaozhongshen commented 2 years ago

Thanks @dummyindex My system is redcat 4.8.5-28 (gcc version 4.8.5). However, I still failed in installing after installing python-devtools and hdbscan. What's the version I need to install for annoy , I want to try it with conda.

dummyindex commented 2 years ago

Thanks @dummyindex My system is redcat 4.8.5-28 (gcc version 4.8.5). However, I still failed in installing after installing python-devtools and hdbscan. What's the version I need to install for annoy , I want to try it with conda.

Hi @xiaozhongshen, Here is my version on Mac:

Name: annoy
Version: 1.17.0
Summary: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk.
Home-page: https://github.com/spotify/annoy
Author: Erik Bernhardsson
Author-email: mail@erikbern.com
License: Apache License 2.0
Location: /Users/random/opt/anaconda3/envs/dynamo-dummyindex-test/lib/python3.9/site-packages
Requires: 
Required-by: trimap

I checked your installation output and it seems the problem is the -std=c++14 argument which your system gcc cmd does not supported during annoy building process. Please try solutions similar to this one on stackoverflow (https://stackoverflow.com/questions/19955775/error-command-gcc-failed-with-exit-status-1-on-centos)

xiaozhongshen commented 2 years ago

Can I know the version of gcc I need to install? Thanks! @dummyindex If you have constructed an environment for dynamo, can you provide the file of yml? I think constructing the environment with conda may make sense.

dummyindex commented 2 years ago

Can I know the version of gcc I need to install? Thanks! @dummyindex If you have constructed an environment for dynamo, can you provide the file of yml? I think constructing the environment with conda may make sense.

We test dynamo on ubuntu and mac, and the current version passes the build process on github. For gcc, you can try version >= 5.2. Your gcc version may be too old to support c++14 standard. Here are two yml(in txt format, restricted by github upload) examples. Note the dynamo package versions are development versions installed from github (revealed by the dev word in package). Both versions use annoy==1.17.0.

dynamo-prod.txt dynamo-dev.txt

If the issue remains, we may consider opening an issue in annoy package github issue thread, since it is an issue regarding annoy installation.

xiaozhongshen commented 2 years ago

Thanks @dummyindex I will open a new issue because I still have some problems in installing the new version.