edroaldo / cellrouter

Reconstruction of complex single-cell trajectories using CellRouter
45 stars 21 forks source link

pretty low GRN scores obtained #8

Open JianmeiZhong opened 6 years ago

JianmeiZhong commented 6 years ago

Hi, I work through most part of the analysis pipeline but the results of GRN scores obtained via grnscores() in specified paths SP_10.SP_8 as tSNE and barplot shows.the grn scores have different levels compared to yours presented in several exemplary analysis,that's level of 0.0 versus 0. .Sometimes,the GRNscores plot has one or two genes with low scores in both directions.I don't know if there's some wrong configuration in my pipeline.If my source and endpoints are correct and reliable,low scores level are possible for some datasets and it's available? barplot another questions asking for your guide :) known knowledge this project based on are less subpopulations and increasing number of cells along differentiation process.So,k=120 was set according to my understanding in function buildGRN() and creatKNN() to obtain relatively less subpopulations as tSNE plot shows.k value is different a lot from yours.It might be a cause of low GRN scores?And if a path didn't go through every subpopulation in non-branched cells differentiation,gene expression dynamics with this path can represent entire process of changes? tsne

Thank a lot !

Jm Zhong

JianmeiZhong commented 6 years ago

add:every time running the grnscores(),warning messages were generated as follows.Hope this helps. [1] 0 There were 20 warnings (use warnings() to see them)

warnings() Warning messages: 1: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 2: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 3: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 4: In if (length(bla) == 1 & names(bla) == r) { ... :

edroaldo commented 6 years ago

Hi, I am sorry for the delay!

Thanks for using CellRouter. Regarding your first question: 1) There is nothing wrong with your pipeline. The values are low though. I think this could be due to dropouts in your data because the dynamics of your regulators/target genes seem heterogeneous. What you could do is the following: in the processtrajectory function, increase "neighs" from 1 to 2 or 3. This will use the second or third neighborhood of each cell along the trajectory to smoothen the kinetic trend. Maybe, by chooing neighs=1 in your dataset does not provide enough information for a good smoothing. Let me know if that works.

cellrouter <- processtrajectories(cellrouter, genes2use, path.rank=ranks[3], num.cells = 3, neighs = 2)

What could be also happing is that you may not have genes that monotonically increase or decrease in expression along your trajectory but are rather more transiently expressed.

Regarding your second question: 2) I do not a larger k will affect the grnscore that much. It should not. I am trying to optimize the clustering step that CellRouter does for larger datasets. For now, choosing a larget k is totally fine. No, the selected path will not represent the entire process of differentiation. I am working on now to also include other paths dynamics. The trajectory identified represents the one where each cell along a particular path is more similar to each other, according to our optimization algorithm. There might be many other paths as well. I am extending the platform in this direction as well.

Please, keep me posted on how it goes!

Thanks!

2018-04-20 9:10 GMT-04:00 JianmeiZhong notifications@github.com:

Hi, I work through most part of the analysis pipeline but the results of GRN scores obtained via grnscores() in specified paths SP_10.SP_8 as tSNE and barplot shows.the grn scores have different levels compared to yours presented in several exemplary analysis,that's level of 0.0 versus 0. .Sometimes,the GRNscores plot has one or two genes with low scores in both directions.I don't know if there's some wrong configuration in my pipeline.If my source and endpoints are correct and reliable,low scores level are possible for some datasets and it's available? [image: barplot] https://user-images.githubusercontent.com/37535747/39050789-438495c2-44d9-11e8-97ba-bd6a4ea498d0.png another questions asking for your guide :) known knowledge this project based on are less subpopulations and increasing number of cells along differentiation process.So,k=120 was set according to my understanding in function buildGRN() and creatKNN() to obtain relatively less subpopulations as tSNE plot shows.k value is different a lot from yours.It might be a cause of low GRN scores?And if a path didn't go through every subpopulation in non-branched cells differentiation,gene expression dynamics with this path can represent entire process of changes? [image: tsne] https://user-images.githubusercontent.com/37535747/39050778-385bd228-44d9-11e8-9e05-571bf8013e36.jpg

Thank a lot !

Jm Zhong

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR9y39O0haayxIve1In_OXKudlT-Yks5tqd4-gaJpZM4Tdb6B .

-- Edroaldo

edroaldo commented 6 years ago

OK. I will check that out as well. I have to seen this msgs erlier.

Thanks a lot for your comments!

2018-04-23 11:11 GMT-04:00 JianmeiZhong notifications@github.com:

add:every time running the grnscores(),warning messages were generated as follows.Hope this helps. [1] 0 There were 20 warnings (use warnings() to see them)

warnings() Warning messages: 1: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 2: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 3: In if (length(bla) == 1 & names(bla) == r) { ... : the condition has length > 1 and only the first element will be used 4: In if (length(bla) == 1 & names(bla) == r) { ... :

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-383610650, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqURyeBCApYEN8C2Lrmh08QHZfj68SGks5tre8OgaJpZM4Tdb6B .

-- Edroaldo

JianmeiZhong commented 6 years ago

Thanks! I got some very useful answers from your nice explications. :) i'll try different neigh value.and yes,the dataset contains 5000~6000 cells.It's a large one.but larger k value,longer time it takes.if k value don't affect the grnscore that much,lower k value I'll set referring to your setting.And I called back some expression of genes using scImpute(WV Li et.al,2018) before and used normalized expression as input. As for the monotonically increase or decreasegenes,many genes are observed as up or down with obvious trends along the non-brached process of differentiation(five cell-stages,for example).I just cannot figure out how the numbers of genes represented in GRNscore plot so less in some samples. Maybe the enpoint subpopulations SP_8 are too small to represent the whole last cell-stage(SP_1,SP_2,SP_5,SP_7,SP_8,S_12,but no marker genes are found for this cell-stage)?I don't know...

JianmeiZhong commented 6 years ago

or if there's any way that subpopulation id of each cell can be customized coercively?That will be a absolutely sole path which all cells can be included. grin:

JianmeiZhong commented 6 years ago

I tried neigh=2 in another sample,nothing seems changed. sample2

edroaldo commented 6 years ago

How many genes are you using for GRN reconstruction? I noticed that, for some datasets, a larger GRN produces more interesting results because each regulator has more target genes. If you have enough memory, you can do it using all genes. If not, you can select more genes from the pca analysis at the top of your script to have more genes in the genes2use variable.

Let me know if you decide to try it.

Thanks!

On Mon, Apr 23, 2018, 1:03 PM JianmeiZhong notifications@github.com wrote:

I tried neigh=2 in another sample,nothing seems changed. [image: sample2] https://user-images.githubusercontent.com/37535747/39141763-466ab774-475b-11e8-920f-97ce4143a0b1.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-383648768, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR5HAwFz1cgHtNcQiQUSfuEhRf-D3ks5trgligaJpZM4Tdb6B .

edroaldo commented 6 years ago

Also, would you mind to show me the commands you are using to generate these plots. Just want to double check how your defining the cutoffs...

Thanks!

On Tue, Apr 24, 2018, 7:42 AM Edroaldo Lummertz da Rocha edroaldo@gmail.com wrote:

How many genes are you using for GRN reconstruction? I noticed that, for some datasets, a larger GRN produces more interesting results because each regulator has more target genes. If you have enough memory, you can do it using all genes. If not, you can select more genes from the pca analysis at the top of your script to have more genes in the genes2use variable.

Let me know if you decide to try it.

Thanks!

On Mon, Apr 23, 2018, 1:03 PM JianmeiZhong notifications@github.com wrote:

I tried neigh=2 in another sample,nothing seems changed. [image: sample2] https://user-images.githubusercontent.com/37535747/39141763-466ab774-475b-11e8-920f-97ce4143a0b1.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-383648768, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR5HAwFz1cgHtNcQiQUSfuEhRf-D3ks5trgligaJpZM4Tdb6B .

JianmeiZhong commented 6 years ago

mmmm....my apologies for missing your reply and my delayed respond.:) I tried to put all genes with non-zero variance into calculation of GRN as you suggested and more genes were presented in GRNscores plot.increasing number of input genes generated more nodes and more GRN genes shown but with pretty low GRN scores mentioned before.more closer endpoint to source subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA from SP_10 to SP_3.

Here are commands generating the low-scores plot and several previous steps. ### selecting genes to use as regulated along developmental trajectories pca <- prcomp(t(ndata),scale=TRUE,center=TRUE) loadings <- pca$rotation num_pc <- 5 quantile <- 0.975 genes2use <- unique(as.vector(unlist(apply(loadings[,1:num_pc],2,function(x){names(x[which(abs(x) >= quantile(x,quantile))])})))) ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R') cellrouter <- findsubpopulations(cellrouter,90,'jaccard','results/kNN_network.gml') cellrouter <- createKNN(cellrouter,90,'jaccard','results/paths/kNN_network_trajectory.gml') tfs <- find_tfs(species = 'Hs') t <- c('SP_10.SP_8') x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets='up',columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep=''))

edroaldo commented 6 years ago

I think this should be fine. I have not seen GRN score that low but your analysis is correct. So, this something related to your dataset. I am testing in other datasetsas well. Will let you know what I get.

Thanks!

2018-05-02 6:06 GMT-04:00 JianmeiZhong notifications@github.com:

mmmm....my apologies for missing your reply and my delayed respond.:) I tried to put all genes with non-zero variance into calculation of GRN as you suggested and more genes were presented in GRNscores plot.increasing number of input genes generated more nodes and more GRN genes shown but with pretty low GRN scores mentioned before.more closer endpoint to source subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA from SP_10 to SP_3.

Here are commands generating the low-scores plot and several previous steps.

selecting genes to use as regulated along developmental trajectories

pca <- prcomp(t(ndata),scale=TRUE,center=TRUE) loadings <- pca$rotation num_pc <- 5 quantile <- 0.975 genes2use <- unique(as.vector(unlist(apply(loadings[,1:numpc],2, function(x){names(x[which(abs(x) >= quantile(x,quantile))])})))) ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R') cellrouter <- findsubpopulations(cellrouter,90,'jaccard','results/kNN network.gml') cellrouter <- createKNN(cellrouter,90,'jaccard','results/paths/kNN_ network_trajectory.gml') tfs <- find_tfs(species = 'Hs') t <- c('SP_10.SP_8') x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets=' up',columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep=''))

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-385928544, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR1e5ZfBAkM5Lhgi6O2ljZFMG53E5ks5tuYUWgaJpZM4Tdb6B .

-- Edroaldo

edroaldo commented 6 years ago

Hi,

I noticed that increasing k in the createKNN function provided better results in the dataset generated by Paul et al (Cell 2015). Could you please try to increase k in the createKNN function?

Please, let me know...

Thanks!

2018-05-08 12:29 GMT-04:00 Edroaldo Lummertz da Rocha edroaldo@gmail.com:

I think this should be fine. I have not seen GRN score that low but your analysis is correct. So, this something related to your dataset. I am testing in other datasetsas well. Will let you know what I get.

Thanks!

2018-05-02 6:06 GMT-04:00 JianmeiZhong notifications@github.com:

mmmm....my apologies for missing your reply and my delayed respond.:) I tried to put all genes with non-zero variance into calculation of GRN as you suggested and more genes were presented in GRNscores plot.increasing number of input genes generated more nodes and more GRN genes shown but with pretty low GRN scores mentioned before.more closer endpoint to source subpopulation was tested to have higher GRN scores,for example 0.1 for SNCA from SP_10 to SP_3.

Here are commands generating the low-scores plot and several previous steps.

selecting genes to use as regulated along developmental trajectories

pca <- prcomp(t(ndata),scale=TRUE,center=TRUE) loadings <- pca$rotation num_pc <- 5 quantile <- 0.975 genes2use <- unique(as.vector(unlist(apply(loadings[,1:num_pc],2,function(x){names(x[which(abs(x)

= quantile(x,quantile))])})))) ggrn <- buildGRN('Hs',ndata,genes2use,2,'results/GRN.R') cellrouter <- findsubpopulations(cellrouter, 90,'jaccard','results/kNN_network.gml') cellrouter <- createKNN(cellrouter,90,'jacca rd','results/paths/kNN_network_trajectory.gml') tfs <- find_tfs(species = 'Hs') t <- c('SP_10.SP_8') x <- grnscores(cellrouter,tfs,t,direction='both',dir.targets='up' ,columns=1,width=8,height=5,flip=T,filename=paste('results/',t,sep=''))

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-385928544, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR1e5ZfBAkM5Lhgi6O2ljZFMG53E5ks5tuYUWgaJpZM4Tdb6B .

-- Edroaldo

-- Edroaldo

JianmeiZhong commented 6 years ago

Ok,I'll try it and let you know how it goes.

JianmeiZhong commented 6 years ago

I tried increasing k in 3 samples but there are no better results than before.they are still pretty low when k=200 .

edroaldo commented 6 years ago

So, I think this could be more specific to your dataset then, which should be fine... I tested in other datasets I usually get higher values...

Thanks!

2018-05-17 4:25 GMT-04:00 JianmeiZhong notifications@github.com:

I tried increasing k in 3 samples but there are no better results than before.they are still pretty low when k=200 .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edroaldo/cellrouter/issues/8#issuecomment-389786635, or mute the thread https://github.com/notifications/unsubscribe-auth/AJqUR1fw-JPjqu9DCq3oWt07_0pU9ksiks5tzTQRgaJpZM4Tdb6B .

-- Edroaldo