GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
365 stars 129 forks source link

projectBulkATAC not working with Harmony? #1379

Closed cschmidl closed 2 years ago

cschmidl commented 2 years ago

Describe the bug Dear Ryan and Jeff, I was running into a problem with the projectBulkATAC function (which I find really useful!!).

I use the recent version from githuub and commented out .loadUWOT (line 168) as it gave an error Error ('unused argument (embedding$params$nc)') Also, we added the force option.

The function runs perfect with the IterativeLSI dimredduction in the testdataset. However, when I run the function with the testdataset on the Object after Harmony and the corresponding UMAP I get the error:

Error in normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, : attempt to extract less than one element 9. stop("attempt to extract less than one element") 8. normalizeDoubleBracketSubscript(i, x, exact = exact, allow.NA = TRUE, allow.nomatch = TRUE) 7. getListElement(x@listData, i, exact = exact) 6. getListElement(x@listData, i, exact = exact) 5. getListElement(x, i, ...) 4. getListElement(x, i, ...) 3. rD[[grep("Features", names(rD))]] 2. rD[[grep("Features", names(rD))]] 1. projectBulkATAC(ArchRProj = proj2, seATAC = rse_list$GSE179593, reducedDims = "Harmony", embedding = "HarmonyUMAP", n = 100, verbose = TRUE, threads = 1, logFile = createLogFile("projectBulkATAC"))

I attached the logfile!

Thank you very much! ArchR-projectBulkATAC-a4b71339b1a8-Date-2022-04-07_Time-16-06-36.log

rcorces commented 2 years ago

Hi @cschmidl! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

rcorces commented 2 years ago

theres some active development on .loadUWOT that I'm not up to date on (https://github.com/GreenleafLab/ArchR/commit/6e7abec25de9217a8830ac1d6a853702c061bf8a). But I imagine this is related to your problem. Can you try updating to the latest release_1.0.2 and trying again.

For starters, it isnt surprising that commenting out loadUWOT results in downstream problems. So the error you are reporting is not what needs to be addressed per se. The function should work without commenting out code.

cschmidl commented 2 years ago

Thank you for the fast response, using simply the 1.0.2 version yields this error now even with the LSI dimreduction from the testdataset: ArchR logging to : ArchRLogs/ArchR-projectBulkATAC-c01820d594a8-Date-2022-04-07_Time-18-06-22.log If there is an issue, please report to github with logFile! 2022-04-07 18:06:24 : Overlap Ratio of Reduced Dims Features = 0.71432 Error in rep(xi, length.out = nvar) : attempt to replicate an object of type 'S4' ArchR-projectBulkATAC-c01820d594a8-Date-2022-04-07_Time-18-06-22.log

rcorces commented 2 years ago

can you provide the output of traceback()

cschmidl commented 2 years ago

4: rbind(deparse.level, ...) 3: rbind(mat, matNotIn) 2: .safeSubset(mat = .getAssay(subATAC, "counts"), subsetRows = paste0("f", seq_along(rDGR))) 1: projectBulkATAC(ArchRProj = proj, seATAC = rse_list$GSE179593, reducedDims = "IterativeLSI", embedding = "UMAP", n = 100, verbose = TRUE, threads = 1, logFile = createLogFile("projectBulkATAC"))

rcorces commented 2 years ago

@cschmidl could you try this new branch dev_projectBulkATAC which includes this change https://github.com/GreenleafLab/ArchR/commit/3a071200b046f12d4db1203adfb03b0080090c5e

cschmidl commented 2 years ago

2022-04-08 07:51:20 : Overlap Ratio of Reduced Dims Features = 0.71432 Error in dimnames<-.data.frame(*tmp*, value = list(n)) : invalid 'dimnames' given for data frame

traceback() 11: stop("invalid 'dimnames' given for data frame") 10: dimnames<-.data.frame(*tmp*, value = list(n)) 9: dimnames<-(*tmp*, value = list(n)) 8: as.array.default(.getAssay(subATAC, "counts")) 7: as.array(.getAssay(subATAC, "counts")) 6: rownames(mat) 5: match(x, table, nomatch = 0) 4: subsetRows %ni% rownames(mat) 3: which(subsetRows %ni% rownames(mat)) 2: .safeSubset(mat = as.array(.getAssay(subATAC, "counts")), subsetRows = paste0("f", seq_along(rDGR))) 1: projectBulkATAC(ArchRProj = proj, seATAC = rse_list$GSE179593, reducedDims = "IterativeLSI", embedding = "UMAP", n = 100, verbose = TRUE, threads = 1, force = TRUE, logFile = createLogFile("projectBulkATAC")) ArchR-projectBulkATAC-108802d6bed81-Date-2022-04-08_Time-07-56-07.log

Sorry I was also wring about the commenting out loadUWOT, it was just the second argument that gave an error 'unused argument (embedding$params$nc)', so this soved it: umapModel <- .loadUWOT(embedding$params$uwotModel) #, embedding$params$nc)

I attach the function that is working on my computer (a colleague fixed it and I think it was mainly removing embedding$params$nc and enabling force=TRUE, but the could not fix the issue with projectBulkATAC & Harmony, and also not the other issue I opened with my real data that reports inconsistencies with the reduces dimensions also in the LSI domreduction. Thanks again for your effort projectBulkATAC.txt !

rcorces commented 2 years ago

I'm having some trouble following the issue.

A reproducible example would be useful. You mentioned that this is failing with the tutorial data set so perhaps all you would need to provide would be the object being passed to seATAC.

cschmidl commented 2 years ago

I'm sorry for the confusion, here is my se ("rse_list"): https://myfiles.uni-regensburg.de/filr/public-link/file-download/044787a07ff55ccb01800a7226bf2c1e/80511/-5564395770685507458/giles_2022_GSE179593_GSE179606_ranged_se_list2.RDS

rse_list$GSE179593 is he one to run:

res=projectBulkATAC( ArchRProj = proj, seATAC = rse_list$GSE179593, reducedDims = "IterativeLSI", embedding = "UMAP", n = 100, verbose = TRUE, threads = 1, force = TRUE, logFile = createLogFile("projectBulkATAC") )

Thank you!

rcorces commented 2 years ago

Long story short, you cannot currently project into a Harmony reduced dimension space.

I'm not sure the details because I'm not that familiar with the projection code but I made a change to the .safeSubset() function on a new branch called dev_safeSubset that should solve your problem https://github.com/GreenleafLab/ArchR/commit/05c13786b7754111c35472ac6db01d7387c166fe

I think there is some incompatibility between rbinding a matrix and a sparse matrix and if I had to guess projectBulkATAC() seems to want a sparse matrix as the counts matrix in seATAC. Either way, the dev_safeSubsetbranch should work for you withIterativeLSI`

cschmidl commented 2 years ago

Thank you for your answer and updated code!

It runs now through the samples of the se, but in the end yields an error with my data, not with the test data (logfile attached): ... 2022-04-13 11:59:06 : Projecting Sample (217 of 217), 2.068 mins elapsed. 2022-04-13 11:59:06 : Error incosistency found with matching LSI dimensions to those used in addEmbedding Returning with simulated reduced dimension coordinates...

In addition, will you implement project into a Hermony reduced dimenstion space? I think that would be super useful not only for me!

Thanks a lot!

ArchR-projectBulkATAC-1521abc9a3e-Date-2022-04-13_Time-11-57-02.log

rcorces commented 2 years ago

I think there is something wrong with the logic in this section of the code but I'm really not sure.

In this section, there are some nested if statements, two of which check for the same exact thing which doesnt make sense to me (L139 and L155).

https://github.com/GreenleafLab/ArchR/blob/3af7c3075a7e95ec88a290b9fecc1061b7bfe885/R/BulkProjection.R#L139-L166

It seems like this if statement is checking whether the number of dimensions used in the original addUMAP() call match the simulated data but this would only be true if you _did not__ use the dimsToUse param when originally adding your embedding. If you did happen to subset the dimensions to use, then it seems like this code would always fail. Maybe I'm misinterpreting this though.

Could you confirm the values of corCutOff, dimsToUse, and scaleDims when you run:

  embedding <- getEmbedding(ArchRProj = ArchRProj, embedding = embedding, returnDF = FALSE)
  corCutOff <- embedding$params$corCutOff
  dimsToUse <- embedding$params$dimsToUse
  scaleDims <- embedding$params$scaleDims
rcorces commented 2 years ago

also, we will eventually try to add projection into Harmony but it will take time.

cschmidl commented 2 years ago

Hi Ryan, I did not use the dimsToUse param. When I understood correctly I ran now with my proj6 and the UMAP from the IterativeLSI:

embedding <- getEmbedding(ArchRProj = proj6, embedding = "UMAP", returnDF = FALSE) corCutOff <- embedding$params$corCutOff dimsToUse <- embedding$params$dimsToUse scaleDims <- embedding$params$scaleDims

corCutOff [1] 0.75 dimsToUse NULL scaleDims NULL

embedding$params$dimsToUse contains embedding$params$dimsToUse1, embedding$params$dimsToUse2 etc and does not contain "scaleDims "

rcorces commented 2 years ago

could you try a few more things for me? Tell me the output of these commands -

rD <- getReducedDims(ArchRProj = ArchRProj, reducedDims = "IterativeLSI", returnMatrix = FALSE)
rD$nDimensions
embedding <- getEmbedding(ArchRProj = ArchRProj, embedding = "UMAP", returnDF = FALSE)
embedding$params$nc
cschmidl commented 2 years ago

Sure!

rD <- getReducedDims(ArchRProj = proj6, reducedDims = "IterativeLSI", returnMatrix = FALSE) rD$nDimensions [1] 24

embedding <- getEmbedding(ArchRProj = proj6, embedding = "UMAP", returnDF = FALSE) embedding$params$nc [1] 11

rcorces commented 2 years ago

ok. interesting. Something seems wrong here and I'm not sure what it is. But the number of dimensions in your LSI (24) is very different from the number of dimensions used in your UMAP (11). I'm not sure why that would be unless the corCutOff used for LSI and UMAP creation were different or dimsToUse was set.

One thing that might be helpful would be to re-run addIterativeLSI(), addUMAP(), and projectBulkATAC() and provide the log files for each. You would need to set force=TRUE for LSI and UMAP.

cschmidl commented 2 years ago

Hi Ryan, I checked and I ran 24 dimesions for LSI and then plottet some metadata on all the dimensions to decide what dimensions to use for the donwstream analysis simply by using dimsToUse for other functions. So if you allow the projectBulkATAC the dimsToUse parameter as well it might work?

rcorces commented 2 years ago

Can you clarify what you meant by your last statement here: https://github.com/GreenleafLab/ArchR/issues/1379#issuecomment-1098384520

I'm not sure how embedding$params$dimsToUse1 and embedding$params$dimsToUse2 would get created.

cschmidl commented 2 years ago

This is contained in the embedding$params after running embedding <- getEmbedding(ArchRProj = proj6, embedding = "UMAP", returnDF = FALSE) corCutOff <- embedding$params$corCutOff dimsToUse <- embedding$params$dimsToUse scaleDims <- embedding$params$scaleDims

embedding$params $n_neighbors [1] 50

$min_dist [1] 0.35

$verbose [1] TRUE

$metric [1] "cosine"

$ret_nn [1] TRUE

$ret_model [1] TRUE

$dimsToUse1 [1] 1

$dimsToUse2 [1] 2

$dimsToUse3 [1] 3

$dimsToUse4 [1] 4

$dimsToUse5 [1] 5

$dimsToUse6 [1] 6

$dimsToUse7 [1] 7

$dimsToUse8 [1] 8

$dimsToUse9 [1] 9

$dimsToUse10 [1] 10

$dimsToUse11 [1] 11

$corCutOff [1] 0.75

$nr [1] 27308

$nc [1] 11

$uwotModel [1] "/Volumes/NGSdata/projects/humanTIL/ArchR/analyses/TILs-pub-v1/Embeddings/Save-Uwot-UMAP-Params-IterativeLSI-16a833f0aac4d-Date-2021-03-03_Time-08-46-57.tar"

$estimateUMAP [1] FALSE

rcorces commented 2 years ago

thanks @cschmidl - this I think has uncovered a pretty substantial bug. Either that or my interpretation of the problem is very far off. I think I've fixed the problem. You should be able to install the new dev_projectBulkATAC2 branch and get this to work.

You will need to either re-run addUMAP() or manually add the correct dimsToUse parameter to your embedding like this: proj6@embeddings$UMAP$params$dimsToUse <- c(1:11) assuming that dims 1-11 are the ones that you used which I think is the case. Once you've done this, you should be able to run projectBulkATAC().

For posterity, the changes made were: https://github.com/GreenleafLab/ArchR/commit/a7abaa429d567482b09982a35cbdff3c61769938 https://github.com/GreenleafLab/ArchR/commit/d5ed45d8f71280726381bc9020eb1bca44317a93 https://github.com/GreenleafLab/ArchR/commit/3663b50248fac50da5bdd0106f14cede5f933bc6 https://github.com/GreenleafLab/ArchR/commit/9db334ef3180f32e056f40b3e7c688194948d968

cschmidl commented 2 years ago

Hi @rcorces, when I rerun the UMAP with dims=c(1:11) proj6 <- addUMAP( ArchRProj = proj6, reducedDims = "IterativeLSI", dimsToUse = dims, name = "UMAP", nNeighbors = 50, minDist = 0.35, metric = "cosine", force = TRUE )

proj6@embeddings$UMAP$params$dimsToUse NULL and projectBulkATAC() yields the same error again ("Error incosistency found with matching LSI dimensions to those used in addEmbedding Returning with simulated reduced dimension coordinates...")

When I add proj6@embeddings$UMAP$params$dimsToUse <- c(1:11) the dimsToUse is defined:

proj6@embeddings$UMAP$params$dimsToUse [1] 1 2 3 4 5 6 7 8 9 10 11

But the function still gives me the same error: ArchR logging to : ArchRLogs/ArchR-projectBulkATAC-47b8386c74a8-Date-2022-04-19_Time-13-39-12.log If there is an issue, please report to github with logFile! 2022-04-19 13:39:13 : Overlap Ratio of Reduced Dims Features = 0.9372 2022-04-19 13:39:14 : Projecting Sample (1 of 217), 0.024 mins elapsed. 2022-04-19 13:39:16 : Projecting Sample (2 of 217), 0.061 mins elapsed. ... ... ... 2022-04-19 13:47:03 : Projecting Sample (217 of 217), 7.845 mins elapsed. 2022-04-19 13:47:06 : Error incosistency found with matching LSI dimensions to those used in addEmbedding Returning with simulated reduced dimension coordinates...

Logfile attached. ArchR-projectBulkATAC-47b8386c74a8-Date-2022-04-19_Time-13-39-12.log

If I run the addUMPA and do not specifiy dimsToUSe and then run projectBulkATAC everything works. Log attached as well. ArchR-projectBulkATAC-47b87bebf725-Date-2022-04-19_Time-13-50-47.log

rcorces commented 2 years ago

It doesnt look like you installed the new branch that I pointed you to? dev_projectBulkATAC2

devtools::install_github("GreenleafLab/ArchR", ref="dev_projectBulkATAC2", repos = BiocManager::repositories())
cschmidl commented 2 years ago

Yes I did, and I think it was installed as I get the info that no additional changes are detected if I want to do it again.

devtools::install_github("GreenleafLab/ArchR", ref="dev_projectBulkATAC2", repos = BiocManager::repositories()) Skipping install of 'ArchR' from a github remote, the SHA1 (9db334ef) has not changed since last install. Use force = TRUE to force installation

rcorces commented 2 years ago

you would need to restart your R session and reload ArchR to make sure the newly installed version is loaded. The reason I'm assuming that you do not have the correct version of ArchR loaded is because your logfile contains the message:

2022-04-19 13:47:06 : Error inconsistency found with matching LSI dimensions to those used in addEmbedding Returning with simulated reduced dimension coordinates...

In the dev_projectBulkATAC2 branch, this error message should read slightly differently...

https://github.com/GreenleafLab/ArchR/blob/9db334ef3180f32e056f40b3e7c688194948d968/R/BulkProjection.R#L157-L164

To make sure you have the correct version loaded, you can always type projectBulkATAC (no parens) into your console and look at the actual function. It should have the changes shown in this commit. https://github.com/GreenleafLab/ArchR/commit/9db334ef3180f32e056f40b3e7c688194948d968

cschmidl commented 2 years ago

Sorry my fault I thought I did! Now it's working if you define as sugested: proj6@embeddings$UMAP$params$dimsToUse <- c(1:11)

I also tested with running a new UMAP and not defining the params as suggested above, then it works as well!

Thanks a lot!

And sorry to repeat, but would be so nice to have for projecting in Harmony space :)

rcorces commented 2 years ago

Great! Glad this worked and thanks for working with me to track this down. I've started the pull request process and will merge into release_1.0.2 once review is complete. We will also work on adding projection into Harmony reducedDims.

waltno commented 10 months ago

Hi Ryan!

Thank you for your amazing work and helpfulness! Are there any updates on this issue? I am trying to run projectBulkATAC with combined dims (ATAC + RNA)... i can't get it to work. let me know if there is a specific release i should use! All the best!

rcorces commented 10 months ago

Never tried it. Would have thought it would work but maybe not.