GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
388 stars 140 forks source link

PlotMarkerHeatmap Log2norm error with subsetted se object #1442

Closed evaham1 closed 2 years ago

evaham1 commented 2 years ago

Attach your log file Logfile from first heatmap: ArchR-plotMarkerHeatmap-34f507af5175b-Date-2022-05-23_Time-19-32-17.log Logfile from second heatmap: ArchR-plotMarkerHeatmap-34f507fc022b7-Date-2022-05-23_Time-19-33-49.log

Describe the bug Hello, I would like to use plotMarkerHeatmap to plot some features (eg genes from GeneScore matrix) which I have selected myself. I read in some of the discussion pages (#1175 and #878) that to do this I needed to first subset the Summarised Experiment object that I get from getMarkerFeatures and then plot this smaller object. However when I tried this I could see that the values I was getting were different from when I used the 'cutoff' parameter in plotMarkerHeatmap. I believe the issue is that the Log2norm step (lines 889-890) occurs before the se object is subset (line 903), so this creates different values depending on if you input a subsetted se object or the full se object. I don't see this difference when I set Log2norm = FALSE.

To Reproduce

inputFiles <- getTutorialData(tutorial = "hematopoiesis")
addArchRGenome("hg19")
ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  filterTSS = 4, 
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = FALSE,
  threads = 1
)
proj <- ArchRProject(
  ArrowFiles = ArrowFiles, 
  outputDirectory = "HemeTutorial",
  copyArrows = TRUE,
  threads = 1
)
proj <- addIterativeLSI(ArchRProj = proj, useMatrix = "TileMatrix", name = "IterativeLSI")
proj <- addClusters(input = proj, reducedDims = "IterativeLSI")
proj <- saveArchRProject(ArchRProj = proj)

# add gene score matrix and calculate differential expression
proj <- addGeneScoreMatrix(proj, threads = 1, force = TRUE)
se <- getMarkerFeatures(proj,
  useMatrix = "GeneScoreMatrix", 
  groupBy = "Clusters", threads = 1)

#######   1: inputting the whole SE object and using 'cutoff' to filter   #########

plotMarkerHeatmap(
  seMarker = se, 
  cutOff = "FDR <= 0.01 & Log2FC >= 5",
  nLabel = 3)

 #######   2: inputting manually subsetted se object using same cutoff   #########

# extract df from se object
markerList <- getMarkers(se, cutOff = "FDR <= 0.01 & Log2FC >= 5")
df <- data.frame()
for (i in 1:length(names(markerList))) {
  print(i)
  df_i <- as.data.frame(markerList[i])
  df <- rbind(df, df_i)
}

# subsetting the se object
rowData(se)
subsetSE <- se[which(rowData(se)$name %in% df$name)]

# plot
plotMarkerHeatmap(
  seMarker = subsetSE, 
  cutOff = "FDR = FDR",
  nLabel = 3)

Expected behavior According to advice from the discussions pages (unless I have missed something more up to date) I should be able to subset an se object before inputting it to plotMarkerHeatmap. I would expect that a resulting heatmap should be equivalent to one created by inputting the full se object and subsetting using 'cutoff' inside the plotMarkerHeatmap function?

Screenshots Heatmap from full se object filtered with cutoff = "FDR <= 0.01 & Log2FC >= 5": image

Heatmap from se object subset by cutoff = "FDR <= 0.01 & Log2FC >= 5" and then plotted: image

rcorces commented 2 years ago

Hi @evaham1! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

evaham1 commented 2 years ago

Hello yes log file and reproducible code is in my post above, thanks!

rcorces commented 2 years ago

Thanks for posting this. I understand the issue and the desired behavior and agree that the previous suggestions about subsetting the se object before hand doesnt produce the desired result. I'll work on a fix. Thanks for your patience.

rcorces commented 2 years ago

I believe this issue has now been solved on the dev_subsetSE branch. I added a parameter called subsetMarkers which allows you to provide a vector of rownames from your seMarker to be used in subsetting. If this is the case, then the values provided to cutOff are effectively ignored.

Can you test this on your end and let me know if you encounter any difficulties?

devtools::install_github("GreenleafLab/ArchR", ref="dev_subsetSE", repos = BiocManager::repositories())
detach("package:ArchR", unload=TRUE)
library(ArchR)
se <- getMarkerFeatures(projHeme5,
                        useMatrix = "GeneScoreMatrix", 
                        groupBy = "Clusters", threads = 1)

#######   1: inputting the whole SE object and using 'cutoff' to filter   #########

plotMarkerHeatmap(
  seMarker = se, 
  cutOff = "FDR <= 0.01 & Log2FC >= 5",
  nLabel = 3)

image

#######   2: inputting manually subsetted se object using same cutoff   #########

markers <- c("TSHB","LOC494141","MS4A1","TSSK4","MIR5196",
             "PWRN2","DCAF4L1","MIR487B","MIR539","MIR889","PRRG3")

marker_idx <- which(rowData(se)$name %in% markers)

# plot
plotMarkerHeatmap(
  seMarker = se, 
  cutOff = "FDR = FDR",
  subsetMarkers = marker_idx,
  nLabel = 3)

image

evaham1 commented 2 years ago

Hi Ryan, thanks for this! Just tested your code and the heatmaps look reasonable to me now (I haven't got exactly the same heatmaps as you but I think that is probably an upstream difference in preprocessing the ArchR object)