edroaldo / cellrouter

Reconstruction of complex single-cell trajectories using CellRouter
45 stars 21 forks source link

Gene filtering method & count formats #18

Closed AswinSSoman closed 2 years ago

AswinSSoman commented 5 years ago

1 Which gene filtering is suited - m3drop, highly variable genes, highly dropout genes, highest PCA loading?

2 Example Vignette starts from filtered data set - So what is the suitable input format for cellrouter - rawcounts / log2 counts / TPM / log2 TPM / FPKM / RPKM ?

edroaldo commented 5 years ago

Hi Aswin,

Thank you for your questions.

Most of the time I use all genes in my analysis, removing lowly expressed ones or expressed in just a few cells, which is something you can do changing tha paramentes in bold below: cellrouter <- CellRouter(rawdata = rawdata, min.cells=10, min.genes=500)

I do not have implemented yet other filtering strategies so I believe you can use whichever you like. You can use the genes.use parameter to perform the analysis: cellrouter <- scaleData(cellrouter, genes.use=rownames(cellrouter@ndata)) cellrouter <- computePCA(cellrouter, num.pcs = 30, genes.use =rownames(cellrouter@ndata), seed=42) #build knn graph on the PCA/DC space

If you can provide rawcounts as input, as in here: cellrouter <- CellRouter(rawdata = rawdata, min.cells=10, min.genes=500)

But you you want to use other normalization strategies, you can easly provide your normalized data to CellRouter using (in this canse, you do not need to run the Normalize function):

cellrouter <- Normalize(cellrouter)

cellrouter@ndata <- your_normalized_data(this is a genes by cells matrix)

Let me know if that clarifies your questions! I apologize for the delay!

Thanks!

Em qua, 21 de nov de 2018 às 03:10, Aswin S Soman notifications@github.com escreveu:

1 https://github.com/edroaldo/cellrouter/issues/1 Which gene filtering

is suited - m3drop, highly variable genes, highly dropout genes, highest PCA loading?

2 https://github.com/edroaldo/cellrouter/issues/2 Example Vignette

starts from filtered data set - So what is the suitable input format for cellrouter - rawcounts / log2 counts / TPM / log2 TPM / FPKM / RPKM ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR2eAjHPRf5Y7KwAFLWHhIyQOPLf9ks5uxQp9gaJpZM4Ysme4 .

-- Edroaldo

AswinSSoman commented 5 years ago

Hi edroaldo,

thanks you for the reply.

What type of normalization is done by the function Normalize(cellrouter)?

Because the function 'smoothDynamics' takes a lot of time if higher number of genes are used, so if I want to filter genes not just zero counts, can you suggest me a sensitive analysis tool to choose parameters like min.cells, min.genes such that it won't affect my results. Or should I use just markers identified from clustering to smoothen the kinetic patterns.

Best regards, Aswin

edroaldo commented 5 years ago

It uses the same normalization as Seurat... can provide more details later... I am travelling right now...

I typically use this to speed up the smoothDynamcis function:

markers <- findSignatures(cellrouter, column = 'population', pos.only = TRUE, fc.threshold = 0.2) cellrouter <- processTrajectories(cellrouter, genes = rownames(markers), path.rank=ranks[3], num.cells = 3, neighs = 2, column.ann = 'population', column.color = 'colors')

You can go lower in the fold change cutoff to include more genes or higher to include less genes.

Thanks!

Em qui, 29 de nov de 2018 às 05:54, Aswin S Soman notifications@github.com escreveu:

Hi edroaldo,

thanks you for the reply.

What type of normalization is done by the function Normalize(cellrouter)?

Because the function 'smoothDynamics' takes a lot of time if higher number of genes are used, so if I want to filter genes not just zero counts, can you suggest me a sensitive analysis tool to choose parameters like min.cells, min.genes such that it won't affect my results. Or should I use just markers identified from clustering to smoothen the kinetic patterns.

Best regards, Aswin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/18#issuecomment-442791419, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR1TVu1hUkjfkcB0RUSCgIe1qWh-Jks5uz7zggaJpZM4Ysme4 .

-- Edroaldo

AswinSSoman commented 5 years ago

Can you please mention what these arguments signifies in function 'processTrajectories' so that I can modify them for my data & it would be useful if you add documentation for these function in future.

path.rank=ranks[3] num.cells = 3 neighs = 2

And when I use markers to pre-process trajectories it gives way less number of regulators to generate a regulator centered sub-network as compared to when I use genes in cellrouter@pathsinfo$distr. Will I miss something significant here?

edroaldo commented 5 years ago

Yes, I am trying to write a documentation for the functions but cannot find enough time...

path.rank="rank", ranks each path connecting source and target populations based on the flow going through the path. The more flow, in principle, mode similar each cell in the path is to the cell next to it. num.cells=3, only include paths that contain at least three cells neighs=2, the size of the neighborhood in the kNN graph to smooth kinetic profiles. When k=2, it will use the first and second neighborhood os a given cell in a given path to smooth the kinetic profile (it basially takes the mean expression of each in the neighboring cells.

Well, this is the tradeoff between using all the genes(slower) or a subset of genes(faster). You could be missing iinformation and this is a decision that has to be made by the user. Maybe you can go lower in your fold change cutoff to identfy the markers?

Thanks!

Em qui, 29 de nov de 2018 às 23:49, Aswin S Soman notifications@github.com escreveu:

Can you please mention what these arguments signifies in function 'processTrajectories' so that I can modify them for my data & it would be useful if you add documentation for these function in future.

path.rank=ranks[3] num.cells = 3 neighs = 2

And when I use markers to pre-process trajectories it gives way less number of regulators to generate a regulator centered sub-network as compared to when I use genes in cellrouter@pathsinfo$distr. Will I miss something significant here?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/18#issuecomment-443090414, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqURwzQlE5bE6mSwUjUec5blvVAZW6Mks5u0LjRgaJpZM4Ysme4 .

-- Edroaldo