Closed koenvandenberge closed 4 years ago
Hi Koen!
There are indeed still a lot of undocumented features. I'll leave this issue open until I get around to going over all of the functions and outputs to see if the documentation is sufficient.
The plots show progression of submodules (e.g., A1 to A6). These relate to sets of transcription factors and may be retrieved using model$feature_network. It would be useful if we could also know for which cells these systems are most active. What do the variables strength, cooperativity relate to in model$feature_network?
In the devel branch, cooperativity has been renamed to hill. The strength parameter will determine the degree to which that regulator regulates a target in comparison to other regulators. The hill coefficient determines the steepness of the regulatory effect of a regulator w.r.t. its expression.
dataset$milestone_percentages provides a coarser resolution of that (i.e. assigns cells to module A/B/C/..). But it is unclear what exactly the percentage column is referring to.
dataset$progressions
has columns cell_id
, from
, to
, percentage
. If one of its rows contains the values [C1, A, B, 0.6]
, then that would mean cell C1 has transitioned 60% from milestone A to B. There is a 1-to-N relation between a row from dataset$progressions
and dataset$milestone_network
. The corresponding values in dataset$milestone_percentages
with columns cell_id
, milestone_id
and percentage
would be [C1, A, 0.4]
and [C1, B, 0.6]
.
For benchmarking purposes in the context of network inference, it would be useful to have a measure that summarizes the contribution of each previous module/TF, e.g. the expression in cell10 is mainly driven by modules X and Y, or TFs J and K. I assume that there should be some sort of decay function (e.g., cells in module A5 will be affected by modules A4, but also A3 and A2; can we summarize this in an appropriate measure?)
With simulation_default(store_grn = TRUE)
you can calculate the effect of a regulator on a target. This was used to generate Figure 5 in the bioRxiv paper. Is this what you are looking for?
Kind regards, Robrecht
Many thanks, Robrecht! This has already cleared up a lot. I realize that writing documentation is a long, tedious, and not a very exciting process...
With simulation_default(store_grn = TRUE) you can calculate the effect of a regulator on a target. This was used to generate Figure 5 in the bioRxiv paper. Is this what you are looking for?
This might be what I need, let me check with you to be sure. It provides the GRN for each cell (I assume that this is the model$experiment$regulation
object?). In this regulation
object, rows are cells and columns are TF-target interactions. I assume that the elements in the matrix are strengths of the interaction.
Would it be fair to say that the general activity/contribution of a particular TF (say, A1_TF1
) in a particular cell (say, cell1
) is the sum of all the columns of regulations that point from A1_TF1
to any target?
For example, if the regulation
matrix is as follows
tf1_tox tf1_toy tf2_toz tf2_toy
cell1 0 3 6 9
cell2 1 4 7 10
cell3 2 5 8 11
The contribution of tf1
in cell1
would be 0+3=3
, in cell2
it would be 1+4=5
, etc.
If this makes it easier to look at, given a simulated dataset (small_linear
in my case), this would be sample R code to achieve this:
# assume that the regulation object is the GRN in each cell.
regulation <- model$experiment$regulation
regulation_from <- strsplit(colnames(regulation), split="_")
# only look at TFs
keep <- unlist(lapply(regulation_from, "[[", 2)) %in% unique(model$feature_info$module_id)
regulation_from <- regulation_from[keep]
id <- unlist(lapply(regulation_from, function(x) paste(x[2:3], collapse="_")))
regulationFiltered <- regulation[,keep]
# calculate TF activity in each cell
R <- matrix(0, nrow=200, ncol=ncol(counts),
dimnames=list(model$feature_info$feature_id, colnames(counts)))
uniqID <- unique(id)
for(ii in 1:length(uniqID)){
curTF <- uniqID[ii]
curCols <- which(id == curTF)
if(length(curCols) < 1) next
R[curTF,] <- rowSums(regulationFiltered[,curCols,drop=FALSE])
}
Hi,
Just checking in whether you've been able to look into the question above yet.
Thanks in advance and hope you're doing well. Best, Koen
Hello Koen,
I'm really sorry, I must have missed the notification of your issue from Github!
In the meantime, you might have noticed on the dyngen@devel branch that store_grn
was renamed to compute_cellwise_grn
, and model$experiment$regulation
was renamed to model$experiment$cellwise_grn
. Sorry about that (but at least this should already answer your next question ;)). I've been trying to get dyngen and the dyngen manuscript prepped for submission, which is taking longer than it should, given the current situation.
This might be what I need, let me check with you to be sure. It provides the GRN for each cell (I assume that this is the model$experiment$regulation object?). In this regulation object, rows are cells and columns are TF-target interactions. I assume that the elements in the matrix are strengths of the interaction.
Exactly. As part of the simulation, the propensity of each reaction needs to be calculated; that is, the likelihood that the reaction will occur during an infinitesimal period of time. The propensity of the transcription reaction of a given gene depends on the abundance levels of the proteins of the gene's regulators. What is being stored in the model$experiment$regulation
object is the difference in propensity if a TF is at its current abundance level (e.g. 10000) versus if there is none (0). To put it a little more formally, given that a cell with a particular state, the regulatory effect of tf1 on a target gene is:
regulation(target, tf1) = P(transcription_{target} | protein_{tf1} = 1000) - P(transcription_{target} | protein_{tf1} = 0)
If tf1 inhibits the target, this value should be negative, whereas otherwise it is positive.
Would it be fair to say that the general activity/contribution of a particular TF (say, A1_TF1) in a particular cell (say, cell1) is the sum of all the columns of regulations that point from A1_TF1 to any target?
Hm. That's an interesting question. It's a funny way of looking at the data that I had not foreseen. I guess this should indeed work. Take into account that if (for example) A1_TF1 upregulates B1_TF1, and if A1_TF1 eventually stops being expressed, and B1_TF1 as well, then over the whole history of the cell A1_TF1 will have resulted in a lot of B1_TF1 even though neither A1_TF1 or B1_TF1 are expressed anymore.
Cool idea! What are you planning on doing with it?
If this makes it easier to look at, given a simulated dataset (small_linear in my case), this would be sample R code to achieve this:
Since you posted this quite a while back, your code doesn't work anymore on my current installation of dyngen :(
I tried to interpret your code. Does this produce the result that you expect?
The code above will not make a distinction between activating and repressing activity, because otherwise (removing the abs(...)
around cellwise_grn_filtered
), if M1_TF1 upregulates one gene and downregulates another, these values will cancel each other out in R.
Kind regards, Robrecht
Hi Robrecht,
No worries, I completely understand. Your answer and accompanying code has provided exactly what I needed, thanks a lot for taking the time to do this! I will send an e-mail regarding how we are planning to use it.
Best, Koen
I believe this issue to be solved, closing this issue.
The framework is extremely useful for evaluating methods in the context of trajectory inference and gene regulatory networks. The default plots as part of the
generate_dataset
and related functions are interesting and provide a good summary of the simulated data.There is a lot of information on those plots where I do not find documentation about, while they would be useful for accurate benchmarking. I apologize if I have overlooked where the documentation resides, but I did not find it.
A few examples, say
dataset
is a simulated object generated usinggenerate_dataset
, andmodel
is the model used.model$feature_network
. It would be useful if we could also know for which cells these systems are most active. What do the variablesstrength
,cooperativity
relate to inmodel$feature_network
?dataset$milestone_percentages
provides a coarser resolution of that (i.e. assigns cells to module A/B/C/..). But it is unclear what exactly thepercentage
column is referring to.cell10
is mainly driven by modules X and Y, or TFs J and K. I assume that there should be some sort of decay function (e.g., cells in module A5 will be affected by modules A4, but also A3 and A2; can we summarize this in an appropriate measure?)Thanks!