cole-trapnell-lab / Scribe

Regulatory networks with Direct Information
16 stars 6 forks source link

RNA velocity declines to estimate some genes #16

Closed concatenize closed 5 years ago

concatenize commented 5 years ago

The RNA velocity toolkit refuses to estimate velocities for a lot of genes. How do you handle this in Scribe? Would it be reasonable to fill in a velocity of zero (constant expression)? Or do I need to somehow impute the missing values? Thanks!

concatenize commented 5 years ago

Another alternative could be to feed in pairs (spliced, unspliced) instead of (current, projected). The spliced counts would serve as a proxy for protein concentrations, and the unspliced counts would be modeled as responding to those protein concentrations.

Xiaojieqiu commented 5 years ago

hi @concatenize I really like what you suggested. using spliced and unspliced counts are definitively the way to go although it may suffer the issue that the time delay between unspliced and spliced transcripts are not one (I need to think about this).

I fell like the current theory of RNA-velocity is pretty limited (as it assumes steady state) and thus a lot gene cannot be measured. Another better way than using intron/exon in RNA-velocity is to use slam-seq for the network inference as we alluded in our manuscript.

Let me know how do you think

concatenize commented 5 years ago

I agree, RNA velocity is limited. After all, the most interesting genes are often the ones not observed in steady state. theory aside, I am also running into a lot of scaling issues with velocyto.R. The code forms dense matrices that fill up my RAM :( so I cannot jointly estimate velocity across my whole dataset. But the core idea makes a lot of sense (using directionality to improve GRN inference). I would like to give it a shot somehow. How did you handle this issue in the chromaffin example in the manuscript? Did you also use (spliced, unspliced)?

Xiaojieqiu commented 5 years ago

the limitation of RNA velocity could be addressed eventually, especially if we use better measurements and models. For now, I can also suggest you try the velocyto python version, etc and import the results into your R analysis. Regarding the chromaffin example, we only focus on a small subset of genes and use only velocyto measurements (current and projected) but the raw spliced and upspliced could be used there too.

As a side note, it is always a very challenging task to infer large-scale network because the possible space of network configuration quickly approaches infinity as nodes in the network increases. But once you have a good set of genes which can be done with many different ways, inference of network is much more manageable.

concatenize commented 5 years ago

Yes, and it is especially difficult with small, tissue-specific datasets. I am planning to select relevant genes, but based on other factors, not based on whether the velocity can be estimated. So, I will try the python version or just use (spliced, unspliced). Thank you for your help!