dynverse / dyngen

Simulating single-cell data using gene regulatory networks 📠
https://dyngen.dynverse.org
Other
73 stars 6 forks source link

Interpretation of HK genes and Target genes #50

Closed spriyansh closed 1 year ago

spriyansh commented 2 years ago

Hello, @rcannood

I am following the tutorial from this website to simulate a bifurcating trajectory.

I am slightly confused between the intuition behind the House Keeping genes (HK) and Target Genes. In the tutorial, it is mentioned that 'Target genes are regulated by TFs or other target genes, while HKs are only regulated by themselves.'.

I am running tradeSeq to elucidate genes that vary with pseudotime/time (which in dyngen's case is sim_time). I would expect that TF genes specific to branches will show up as significant. But I also see a few HK genes and some Target Genes showing up as significant in bifurcated branches.

Does this mean both target genes and HK genes act as noise to the simulation? How will the number of HK genes and target genes affect the ground truth of the bifurcated trajectory?

Also, is there a way to plot the true expression of a single gene against pseudotime? Because the following plot will plot every gene, which is quite overwhelming to interpret per gene.

plot_gold_expression(model, what = "mol_mrna") # mrna

Thanks in advance, really like the tool!

rcannood commented 1 year ago

The Housekeeping (HK) genes and Target genes in the simulation serve different purposes. HK genes are intended to simulate noise or fluctuations in gene expression that are not driven by regulatory interactions, while Target genes are meant to represent genes that are regulated by other genes or Transcription Factors (TFs).

The presence of both HK and Target genes in the simulation may impact the overall results, including the bifurcated trajectory. The number of HK genes may affect the overall level of noise in the simulation, while the number of Target genes and their regulation can impact the overall pattern of gene expression in the simulation.

To plot the expression of a single gene against pseudotime, you could try one of the different visualisations that dynplot offers:

library(tidyverse)
library(dyngen)

set.seed(1) # set seed for reproducibility

# create a bifurcating backbone for the model
backbone <- backbone_bifurcating()

# initialize the model with specified parameters
config <- initialise_model(
  backbone = backbone,
  num_cells = 1000,
  num_tfs = nrow(backbone$module_info),
  num_targets = 500,
  num_hks = 500,
  download_cache_dir = tools::R_user_dir("dyngen"),
  num_cores = 30,
  simulation_params = simulation_default(
    total_time = 1000,
    census_interval = 10,
    ssa_algorithm = ssa_etl(tau = 300/3600),
    experiment_params = simulation_type_wild_type(
      num_simulations = 20
    )
  )
)

out <- generate_dataset(config)

library(dynwrap)
library(dynplot)

obj <- as_dyno(out$model) %>%
  add_pseudotime()

plot_dendro(
  obj,
  feature_oi = c("A1_TF1")
)

plot_onedim(
  obj,
  feature_oi = c("A1_TF1")
)

Alternatively, you could manually combine the obj$pseudotime and obj$counts objects and create a visualisation using ggplot2.

Hope this helps!