Open HelloWorldLTY opened 10 months ago
Hi, the value here is an indication of the openness of the chromatin region. If an integer count number is needed, it should be fine to just use round()
.
Hi, thanks for your explaniation. However, I am still confused. If it represents the openness of the chromatin region,
According to its definition, it meaures the hits of fragement in the given region, so why it is not a integer but a float (even not a fraction)? Do you have any specific distribution assumption for this simulation? Thanks.
I am not sure if round can work because for example, 3.4 and 3.6 might not be in a large difference, but they will be transfered in to 3 and 4, which has a larger gap.
Basically, we first sample x
from a distribution x ~ D
, where D
is the distribution fitted from a real ATAC dataset's log-transformed counts, then output y = 2^x+1
as the simulated ATAC count matrix to recover the original data distribution. That's why it contains float numbers.
I think rounding should be fine because it will not change the distribution much; and noise is already introduced during simulation and sampling anyway.
Got it, thanks a lot.
Furthermore, I wonder if you have plan to integrate your results into format like h5ad or anndata, which can be handled by python in an easier approach. Most of the methods for rna velocity inference are based on python, thus I think it is a potential approach to advertise your work.
Here is another bug (at least I think) I just found:
It seems that there is no tree structure like phyla1? But it seems that in the tutorial we could use Phyla1 in our simulation. Did I miss something? Thanks.
Sorry for my late reply. Phyla1
only has one argument: len
, which is the branch length.
The tree contains one branch connecting the root and only one leaf:
Root ------> A
There's no plotting
argument, because R refuses to plot it since it only has one tip and therefore not considered as a tree. It's likely that people don't need to visualize it since the structure is so simple, though.
> Phyla1()
Phylogenetic tree with 1 tips and 1 internal nodes.
Tip labels:
A
Rooted; includes branch lengths.
Ok, thanks a lot. I will try this approach.
Hi, I notice that you have the ability to simulate multiomic data, but I have some questiosn about the simulated data.
It seems that for the atac-seq data, the region by cell matrix is not count data, and there exists data smaller than 1. If so, may I know the reason? Is it the result after td-idf processing? If so, what should I do if I intend to run some methods based on count method like MultiVI? Thanks.