aristoteleo / dynamo-release

Inclusive model of expression dynamics with conventional or metabolic labeling based scRNA-seq / multiomics, vector field reconstruction and differential geometry analyses
https://dynamo-release.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
425 stars 59 forks source link

Is there a docker version for dynamo? #54

Closed bitcometz closed 2 years ago

bitcometz commented 4 years ago

Hello, I find dynamo a great tool for single-cell analysis tool. However, I found it hard to install this package because the network of my cluster is bad. Is there possible provide a docker image of dynamo?

Thanks!!!

Xiaojieqiu commented 4 years ago

Hi @bitcometz thanks for your interest in our work! A docker image of dynamo is certainly one of the things on my list. However, our lab is currently relocating and this involves a lot of moving parts. Thus I will have to delay this request for some time. Thanks for your understanding and patience! Meanwhile, if you get the bandwidth to contribute to this, that is highly appreciated!

bitcometz commented 4 years ago

hi, Xiaojieqiu Thanks for your reply! Thank you for developing such a good software. I am happy to create a version of docker: a version based on >python3.6 and install the dynamo. Besides, I got some reads with metabolic labelling of nascent RNA from scRNA-Seq, which is generated by our own experiments that are similar to the 10x genomics protocols. As you said in the introduction, I should first use a NASC-seq analysis pipeline to produce loom files from raw reads and then run dynamo. So I better also put dynamo into the same image. Once I finished this image, I will push it into the dockerhub. Is it ok?

I have two questions:

  1. Is my data suitable to run the NASC-seq+dynamo to inspect the RNA velocity? In general, we did the 4sU labeled RNA by our own single-sequencing and a control experiment without 4sU labeling. I just started doing these analyses, so I have no experience. I hope you can give me some instructions.

  2. I am trying my best(hard) to read your dynamo paper (it contains a large number of mathematical formulas and derivations) and I cannot see any comparison with scSLAM-Seq method -- a Bayesian method to compute the ratio of new to total RNA (NTR) in a fully quantitative manner including credible intervals. I think dynamo has implemented related functions to calculate the RNA velocity -- an efficient mathematical framework. Am I right?

Thanks!!!

Xiaojieqiu commented 4 years ago

Hey @bitcometz Thanks for your generous agreement on creating a version! I used python 3.6.8 for dynamo development so a python 3.6 or higher version sounds good to me. Also feel free to push to dockerhub!

Congrats on getting some labeling scRNA-seq data! This is really exciting as it opens door for a lot of new discoveries.

Regarding NASC-seq analysis pipeline which I also really like, I want to mention a few things to give you some heads up:

  1. NASC-seq pipeline doesn't account for UMI, so it may not work for your drop-seq/inDrop-seq like data
  2. NASC-seq pipeline models new/old RNa on the basis of each reads, so it requires deep sequencing and also unfortunately very long running time. For UMI based large-scale indrop/drop-seq experiment, it may be worth adapting to model on UMIs of each gene and across cells. By modeling on each gene/across cells, I mean the binomial mixture model to estimate the background mutation and true T->C or A-G mutations. You may need to dig into the math of the excellent GRAND-SLAM bioinformatics paper a little bit to grasp the idea.
  3. NASC-seq doesn't output a loom file for now (Maybe they are planning to do that). However, I have a tutorial (https://github.com/aristoteleo/dynamo-tutorials/blob/master/NASC_seq.ipynb) showing how to import NASC-seq new/total RNA data table into Dynamo for analysis. Note that those notebooks are outdated and you should try re-run them. We made tons of improvements since the notebooks are released, so you will be pleased to see some very exciting and/ improved results. (btw, we will update those notebooks once we plan to release our package to PyPi in a couple of weeks)

Regarding to your questions: 1), yeah, your data is ideal for dynamo analysis. Note that dynamo provides really a comprehensive framework for estimating metabolic labeling experiments (for example, it can handle either, one-shot, kinetic, degradation metabolic labeling experiment, using either steady-state, stochastic, or kinetic models). Please check the documentation dyn.tl.dynamics -- which is very detailed and have several pages in length, for details on all the possibility we can do.

For your 4sU and control, you should expect no new RNA detected for control and considerable high new RNA for 4sU labeling data (especially for those quick turnover genes). You are always welcome to ask me for more details later. If you think the discussion needs to be private, you can reach at xqiu.sc@gmail.com

2), Dynamo paper is the finest paper I ever wrote so far and I try to integrate many ideas I had in the past 10 years. Yan developed the estimation framework and I bring in the method to recover vector field function, etc. -- so the paper is indeed a little bit technically challenging but I hope the ideas/attempts I want to accomplish in the paper are clear.

In dynamo, we don't have the grand-slam method (or the NASC-seq pipeline) implemented yet. We will work on one method that simplifies the new/old RNA quantification and specifically targets for UMI data though. But it needs some time for development.

And yes, we implemented a method that can estimate the transcription/splicing/degradation rate for different genes and use them to calculate the RNA velocity. But dynamo also doesn't stop there, we then take the sparse and noisy samples of RNA velocity to learn a vector field that can be then used to predict cell fate over a much long time-period.

lastly, you may also find my Scribe work be useful for you: https://github.com/aristoteleo/Scribe-py

bitcometz commented 4 years ago

Dear Xiaojieqiu,

Thank you for your detailed reply !!! I will use python 3.6.8 for this image.

  1. I have been debugging NASC-seq for these two days and have already got the pi-g for my data. It is great that NASC-seq works for single-cell data (https://github.com/sandberg-lab/NASC-seq/issues/7). Thanks again for giving detailed suggestions. You are right I should dig into the excellent NASC-seq bioinformatics pipelines and maybe I can modify some steps in this process to make it more suitable for single-cell data. And very looking forward to your developed method for calculating the new/old RNA quantification.

  2. We are developing the experimental protocol for 4sU-label scRNA-Seq and it is at an early stage. Yes we would like to share more details with you and we are happy to share some of our details and related issues that can be discussed with you. And yes, it is better to send you through email.

  3. Your article is well worth taking the time to study, although my progress is slow now. And also thanks for introducing Scribe and I think it will be used in our analysis. I just got into this field, so I still need to learn a lot. Fortunately I can meet you guys.

Thanks !!!

Best,

Xiaojieqiu commented 4 years ago

Hi @bitcometz I hope things are going well with you! I am in the process of planning a pre-release of dynamo to PyPi in a few days. I wonder whether you have figured out the docker version of dynamo? I am more available now and should be able to be more helpful. Please also let me any further questions/comments!

Xiaojieqiu commented 4 years ago

@bitcometz I tried to use github action to automatically deploy a docker image every time I tag a release but that attempt was failed. check my docker action file here: https://github.com/aristoteleo/dynamo-release/blob/master/.github/workflows/python-docker.yml and also my Dockerfile here: https://github.com/aristoteleo/dynamo-release/blob/master/Dockerfile

Please let me know if you want to help me to fix those. In addition, I am not very sure about the value of a docker image since now our package is already on PyPi. Either way, please let me what your sense of this and how should we move forward.

bitcometz commented 4 years ago

@Xiaojieqiu Dear Xiaojieqiu, so sorry for the reply. Too many project deadlines for me recently.

I planed to build the docker file, which installed NASC-seq bioinformatics pipelines, dynamo and other softwares related to RNA velocity analysis. But this should require us to complete both the experimental process and the information process. This process might take a long time. Regarding this experiment of RNA velocity, we still don't know what effective indicators should be used to judge whether the experiment is successful or not, so we have been hovering in the optimization process of the experiment, and there is no time to do downstream analysis.

It is great that you have uploaded the package to PyPi. For me and I think for most people, the PyPi is enough. The docker image should not be the highest priority at the moment.

And hopefully, I hope we can find effective indicators to judge whether the experiment is successful, and then continue to do the downstream analysis. Then we can use this program. Because we are doing this kind of experiment for the first time, a lot of knowledge needs to be supplemented, and there are many other projects, so the energy is relatively limited.

Thanks again for developing this great pipeline :clap: :clap: :clap: :clap: :clap: .

Best,