GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Error in addDeviationsMatrix #504

Closed xz333 closed 3 years ago

xz333 commented 3 years ago

ArchR-addDeviationsMatrix-1788a61a09cea-Date-2021-01-14_Time-17-53-59.log

Hi I generated some subclusters through the subsetArchRProject() command and saved them, and then some subclusters will have this error when running addDeviationsMatrix: 2021-01-14 22:00:20 : mk2_6R_AT22 (77 of 107) : Deviations for Annotation 168 of 870, 4.437 mins elapsed. 2021-01-14 22:00:21 : Monkey_6L_1_5 (74 of 107) : Deviations for Annotation 368 of 870, 10.072 mins elapsed. 2021-01-14 22:00:24 : mk3_1R_AT5 (75 of 107) : Deviations for Annotation 288 of 870, 8.423 mins elapsed.

2021-01-14 22:00:25 : ERROR Found in .computeDeviations for Monkey_6L_2_3 (72 of 107) LogFile = ArchRLogs/ArchR-addDeviationsMatrix-1788a61a09cea-Date-2021-01-14_Time-17-53-59.log

<simpleError in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent>

rcorces commented 3 years ago

In the future, please use the provided issue templates to open an issue.

First - If you would like assistance with your error, please try to recapitulate this error with the tutorial dataset. If you are able to recapitulate the error, please post the code to do so. Second - If you cannot recapitulate the error, please provide the commands you used to subset your project and add deviations. As is, we dont have enough information to help you.

xz333 commented 3 years ago

Thanks for your answer, I used the same code to run the tutorial dataset successfully,not only that, some of my subclusters can run successfully, but some subclusters report errors mentioned above. This is my code to generate subclusters:

projs = subsetArchRProject(proj, cells = proj$cellNames[pro$rcelltype %in% sub], outputDirectory ="")

this is my code to run addDeviationsMatrix: proj4 = loadArchRProject("") proj4 <- addMotifAnnotations(ArchRProj = proj4, motifSet = "cisbp", name = "Motif",species = "Homo sapiens",force = TRUE) projHeme5 <- addBgdPeaks(proj4,force =TRUE) projHeme5 <- addDeviationsMatrix( ArchRProj = projHeme5, peakAnnotation = "Motif", force = TRUE )

Very appreciated for your supports!

rcorces commented 3 years ago

Can you try saving your project subset to a different output directory (a clean new directory)?

xz333 commented 3 years ago

Yeah, I created a new output and saved subclusters. I found the problem. The arrow file report error has only one cell ( Monkey_6L_2_3 (72 of 107) ). Arrow files containing multiple cells will not report an error. Currently I will try to delete these single cells and run again. Thank you for your help.

rcorces commented 3 years ago

Thanks @xz333 for figuring this out. @jgranja24 - seems like this edge case should be handled differently. I'll leave this issue open until we post a solution.

jgranja24 commented 3 years ago

Hmm @rcorces ill look at possible places where the matrix could be dropping.

jgranja24 commented 3 years ago

This is now fixed! ---

devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.1", repos = BiocManager::repositories())

library(ArchR)

fragments <- getTestFragments()
system("cp PBMCSmall.tsv.gz PBMCSmall-1.tsv.gz")
system("cp PBMCSmall.tsv.gz PBMCSmall-2.tsv.gz")
f1 <- "PBMCSmall-1.tsv.gz"
f2 <- "PBMCSmall-2.tsv.gz"
arrowFiles <- createArrowFiles(c(f1, f2), sampleNames=c("P1", "P2"), minFrags = 100, force=TRUE)
arrowFiles
# [1] "P1.arrow" "P2.arrow"
o <- ArchR:::.copyArrowSingle(
    inArrow=arrowFiles[2], 
    outArrow="P2-1Cell.arrow",
    cellsKeep=ArchR:::.availableCells(arrowFiles[2])[1]
)
arrowFiles <- c("P1.arrow", "P2-1Cell.arrow")

#ArchR Project
proj <- ArchRProject(arrowFiles)
table(proj$Sample)
#   P1   P2 
# 2268    1 

#Analysis
proj <- addIterativeLSI(proj)
proj <- addClusters(proj)
# > proj$Clusters %>% table
# .
#  C1  C2  C3  C4 
# 966 143 542 618 
proj <- addGroupCoverages(proj)
proj <- addReproduciblePeakSet(proj, pathToMacs2 = "/Users/jeffreygranja/Library/Python/2.7/bin/macs2")
proj <- addPeakMatrix(proj)
proj <- addMotifAnnotations(proj)

# Old Error Now is Fixed!
# ************************************************************
# 2021-01-23 13:16:18 : ERROR Found in .computeDeviations for P2 (2 of 2) 
# LogFile = ArchRLogs/ArchR-addDeviationsMatrix-172a4730c811d-Date-2021-01-23_Time-13-15-44.log

# <simpleError in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent>

# ************************************************************
proj <- addDeviationsMatrix(proj)
# Identifying Background Peaks!
# ArchR logging to : ArchRLogs/ArchR-addDeviationsMatrix-172a46f0265a7-Date-2021-01-23_Time-13-39-21.log
# If there is an issue, please report to github with logFile!
# NULL
# 2021-01-23 13:39:23 : Batch Execution w/ safelapply!, 0 mins elapsed.
# Can not create group. Object with name 'MotifMatrix' already exists.
# 2021-01-23 13:39:25 : chromVAR deviations P1 (1 of 2) Schep (2017), 0.039 mins elapsed.
# Can not create group. Object with name 'MotifMatrix' already exists.
# 2021-01-23 13:39:25 : chromVAR deviations P2 (2 of 2) Schep (2017), 0.039 mins elapsed.
# 2021-01-23 13:39:27 : P2 (2 of 2) : Deviations for Annotation 43 of 870, 0.025 mins elapsed.
# 2021-01-23 13:39:29 : P2 (2 of 2) : Deviations for Annotation 86 of 870, 0.056 mins elapsed.
# 2021-01-23 13:39:30 : P2 (2 of 2) : Deviations for Annotation 129 of 870, 0.085 mins elapsed.
# 2021-01-23 13:39:33 : P1 (1 of 2) : Deviations for Annotation 43 of 870, 0.112 mins elapsed.
# 2021-01-23 13:39:33 : P2 (2 of 2) : Deviations for Annotation 172 of 870, 0.122 mins elapsed.
# 2021-01-23 13:39:35 : P2 (2 of 2) : Deviations for Annotation 215 of 870, 0.163 mins elapsed.
# 2021-01-23 13:39:37 : P2 (2 of 2) : Deviations for Annotation 258 of 870, 0.201 mins elapsed.
# 2021-01-23 13:39:39 : P1 (1 of 2) : Deviations for Annotation 86 of 870, 0.224 mins elapsed.
# 2021-01-23 13:39:39 : P2 (2 of 2) : Deviations for Annotation 301 of 870, 0.235 mins elapsed.
# 2021-01-23 13:39:41 : P2 (2 of 2) : Deviations for Annotation 344 of 870, 0.265 mins elapsed.
# 2021-01-23 13:39:43 : P2 (2 of 2) : Deviations for Annotation 387 of 870, 0.3 mins elapsed.
# 2021-01-23 13:39:45 : P2 (2 of 2) : Deviations for Annotation 430 of 870, 0.328 mins elapsed.
# 2021-01-23 13:39:46 : P1 (1 of 2) : Deviations for Annotation 129 of 870, 0.331 mins elapsed.
# 2021-01-23 13:39:47 : P2 (2 of 2) : Deviations for Annotation 473 of 870, 0.356 mins elapsed.
# 2021-01-23 13:39:48 : P2 (2 of 2) : Deviations for Annotation 516 of 870, 0.385 mins elapsed.
# 2021-01-23 13:39:50 : P2 (2 of 2) : Deviations for Annotation 559 of 870, 0.416 mins elapsed.
# 2021-01-23 13:39:52 : P2 (2 of 2) : Deviations for Annotation 602 of 870, 0.447 mins elapsed.
# 2021-01-23 13:39:53 : P1 (1 of 2) : Deviations for Annotation 172 of 870, 0.454 mins elapsed.
# 2021-01-23 13:39:54 : P2 (2 of 2) : Deviations for Annotation 645 of 870, 0.482 mins elapsed.
# 2021-01-23 13:39:56 : P2 (2 of 2) : Deviations for Annotation 688 of 870, 0.515 mins elapsed.
# 2021-01-23 13:39:58 : P2 (2 of 2) : Deviations for Annotation 731 of 870, 0.548 mins elapsed.
# 2021-01-23 13:40:00 : P2 (2 of 2) : Deviations for Annotation 774 of 870, 0.584 mins elapsed.
# 2021-01-23 13:40:02 : P1 (1 of 2) : Deviations for Annotation 215 of 870, 0.598 mins elapsed.
# 2021-01-23 13:40:03 : P2 (2 of 2) : Deviations for Annotation 817 of 870, 0.621 mins elapsed.
# 2021-01-23 13:40:05 : P2 (2 of 2) : Deviations for Annotation 860 of 870, 0.654 mins elapsed.
# 2021-01-23 13:40:09 : Finished Computing Deviations!, 0.765 mins elapsed.
# 2021-01-23 13:40:10 : P1 (1 of 2) : Deviations for Annotation 258 of 870, 0.73 mins elapsed.
# 2021-01-23 13:40:17 : P1 (1 of 2) : Deviations for Annotation 301 of 870, 0.85 mins elapsed.
# 2021-01-23 13:40:24 : P1 (1 of 2) : Deviations for Annotation 344 of 870, 0.968 mins elapsed.
# 2021-01-23 13:40:31 : P1 (1 of 2) : Deviations for Annotation 387 of 870, 1.083 mins elapsed.
# 2021-01-23 13:40:37 : P1 (1 of 2) : Deviations for Annotation 430 of 870, 1.183 mins elapsed.
# 2021-01-23 13:40:43 : P1 (1 of 2) : Deviations for Annotation 473 of 870, 1.287 mins elapsed.
# 2021-01-23 13:40:49 : P1 (1 of 2) : Deviations for Annotation 516 of 870, 1.392 mins elapsed.
# 2021-01-23 13:40:56 : P1 (1 of 2) : Deviations for Annotation 559 of 870, 1.499 mins elapsed.
# 2021-01-23 13:41:02 : P1 (1 of 2) : Deviations for Annotation 602 of 870, 1.607 mins elapsed.
# 2021-01-23 13:41:09 : P1 (1 of 2) : Deviations for Annotation 645 of 870, 1.723 mins elapsed.
# 2021-01-23 13:41:16 : P1 (1 of 2) : Deviations for Annotation 688 of 870, 1.837 mins elapsed.
# 2021-01-23 13:41:23 : P1 (1 of 2) : Deviations for Annotation 731 of 870, 1.95 mins elapsed.
# 2021-01-23 13:41:30 : P1 (1 of 2) : Deviations for Annotation 774 of 870, 2.069 mins elapsed.
# 2021-01-23 13:41:37 : P1 (1 of 2) : Deviations for Annotation 817 of 870, 2.185 mins elapsed.
# 2021-01-23 13:41:44 : P1 (1 of 2) : Deviations for Annotation 860 of 870, 2.297 mins elapsed.
# 2021-01-23 13:41:49 : Finished Computing Deviations!, 2.443 mins elapsed.
# ###########
# 2021-01-23 13:41:49 : Completed Computing Deviations!, 2.473 mins elapsed.
# ###########
# ArchR logging successful to : ArchRLogs/ArchR-addDeviationsMatrix-172a46f0265a7-Date-2021-01-23_Time-13-39-21.log
xz333 commented 3 years ago

Hi Ryan! @rcorces The problem of error reporting was resolved by updating ArchR to 1.01, but there was an error in getVarDeviations step:

plotVarDev <- getVarDeviations(proj, name = "MotifMatrix", plot = TRUE) DataFrame with 6 rows and 6 columns seqnames idx name combinedVars combinedMeans rank

f1 z 1 TFAP2B_1 NaN 0.00386756 1 f2 z 2 TFAP2D_2 NaN -0.08985583 2 f3 z 3 TFAP2C_3 NaN 0.01720137 3 f4 z 4 TFAP2E_4 NaN -0.01122243 4 f5 z 5 TFAP2A_5 NaN 0.00198498 5 f6 z 6 ARID3A_6 NaN -0.06556130 6 Why is the value of combinedVars all NAN? If it is addDeviationsMatrix running through ArchR1.0.0, then running getVarDeviations will be ok. Thanks!
rcorces commented 3 years ago

@xz333 - I'm not sure I understand your post, what the error is, and whether its an issue. If you'd like additional help, please provide more detailed information.

xz333 commented 3 years ago

@rcorces You can try this, run plotVarDev <- getVarDeviations(proj, name = "MotifMatrix", plot = TRUE) once in the above example and you can find the problem:

devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.1", repos = BiocManager::repositories())

library(ArchR)

fragments <- getTestFragments()
system("cp PBMCSmall.tsv.gz PBMCSmall-1.tsv.gz")
system("cp PBMCSmall.tsv.gz PBMCSmall-2.tsv.gz")
f1 <- "PBMCSmall-1.tsv.gz"
f2 <- "PBMCSmall-2.tsv.gz"
arrowFiles <- createArrowFiles(c(f1, f2), sampleNames=c("P1", "P2"), minFrags = 100, force=TRUE)
arrowFiles
# [1] "P1.arrow" "P2.arrow"
o <- ArchR:::.copyArrowSingle(
    inArrow=arrowFiles[2], 
    outArrow="P2-1Cell.arrow",
    cellsKeep=ArchR:::.availableCells(arrowFiles[2])[1]
)
arrowFiles <- c("P1.arrow", "P2-1Cell.arrow")

#ArchR Project
proj <- ArchRProject(arrowFiles)
table(proj$Sample)
#   P1   P2 
# 2268    1 

#Analysis
proj <- addIterativeLSI(proj)
proj <- addClusters(proj)
# > proj$Clusters %>% table
# 
#  C1  C2  C3  C4 
# 966 143 542 618 
proj <- addGroupCoverages(proj)
proj <- addReproduciblePeakSet(proj, pathToMacs2 = "/Users/jeffreygranja/Library/Python/2.7/bin/macs2")
proj <- addPeakMatrix(proj)
proj <- addMotifAnnotations(proj)
proj <- addDeviationsMatrix(proj)

plotVarDev <- getVarDeviations(proj, name = "MotifMatrix", plot = TRUE)
DataFrame with 6 rows and 6 columns
   seqnames     idx     name combinedVars combinedMeans      rank
      <Rle> <array>  <array>    <numeric>     <numeric> <integer>
f1        z       1 TFAP2B_1          NaN    0.02529380         1
f2        z       2 TFAP2D_2          NaN   -0.00620999         2
f3        z       3 TFAP2C_3          NaN    0.06182041         3
f4        z       4 TFAP2E_4          NaN    0.00545913         4
f5        z       5 TFAP2A_5          NaN    0.01596789         5
f6        z       6 ARID3A_6          NaN   -0.04510937         6

image

All combinedVars values are NAN

jgranja24 commented 3 years ago

I tested this out and found the bug with .combineVariances. I will have a fix shortly.

jgranja24 commented 3 years ago

With the new fix this should work now --


devtools::install_github("GreenleafLab/ArchR", ref="release_1.0.1", repos = BiocManager::repositories())

library(ArchR)

fragments <- getTestFragments()
system("cp PBMCSmall.tsv.gz PBMCSmall-1.tsv.gz")
system("cp PBMCSmall.tsv.gz PBMCSmall-2.tsv.gz")
f1 <- "PBMCSmall-1.tsv.gz"
f2 <- "PBMCSmall-2.tsv.gz"
arrowFiles <- createArrowFiles(c(f1, f2), sampleNames=c("P1", "P2"), minFrags = 100, force=TRUE)
arrowFiles
# [1] "P1.arrow" "P2.arrow"
o <- ArchR:::.copyArrowSingle(
    inArrow=arrowFiles[2], 
    outArrow="P2-1Cell.arrow",
    cellsKeep=ArchR:::.availableCells(arrowFiles[2])[1]
)
arrowFiles <- c("P1.arrow", "P2-1Cell.arrow")

#ArchR Project
proj <- ArchRProject(arrowFiles)
table(proj$Sample)
#   P1   P2 
# 2268    1 

#Analysis
proj <- addIterativeLSI(proj)
proj <- addClusters(proj)
# > proj$Clusters %>% table
# 
#  C1  C2  C3  C4 
# 966 143 542 618 
proj <- addGroupCoverages(proj)
proj <- addReproduciblePeakSet(proj, pathToMacs2 = "/Users/jeffreygranja/Library/Python/2.7/bin/macs2")
proj <- addPeakMatrix(proj)
proj <- addMotifAnnotations(proj)
proj <- addDeviationsMatrix(proj)

plotVarDev <- getVarDeviations(proj, name = "MotifMatrix", plot = TRUE)
# DataFrame with 6 rows and 6 columns
#      seqnames     idx      name combinedVars combinedMeans      rank
#         <Rle> <array>   <array>    <numeric>     <numeric> <integer>
# f336        z     336  SPIB_336      5.18646     0.2144650         1
# f322        z     322  SPI1_322      3.60967     0.2070753         2
# f142        z     142 FOSL1_142      3.34614     0.1084377         3
# f139        z     139  JUNB_139      3.18676     0.0988181         4
# f124        z     124  JUND_124      3.18254     0.0951941         5
# f137        z     137   FOS_137      3.10851     0.0849912         6
rcorces commented 3 years ago

Closing but feel free to comment again if you feel your issue hasnt been addressed and I will re-open