GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)
MIT License
384 stars 137 forks source link

Plot embedding shows weird stripe pattern #1731

Closed Ping-lin14 closed 1 year ago

Ping-lin14 commented 1 year ago

Hi

I found strange embedding result after updating to release_1.0.3 using dev. I have reproduced in the tutorial. 1668067955313

Please see log file attached. ArchR-plotEmbedding-276bb1560a33a-Date-2022-11-10_Time-03-00-02.log

rcorces commented 1 year ago

Hi @Ping-lin14! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

mehc555 commented 1 year ago

I have the same issue using ArchR_1.0.2

rcorces commented 1 year ago

I am not able to reproduce this using the tutorial dataset and the current dev branch (1.0.3). Perhaps you can provide more information. It does not appear to be an issue with the uwot version as we are both using 0.1.14

> projHeme5 <- loadArchRProject(path = "./Save-ProjHeme5/")
Successfully loaded ArchRProject!

                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
        ,--' ,----`-,__ ___/'  --,-`-===================##========>
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|

> projHeme5 <- addImputeWeights(ArchRProj = projHeme5)
ArchR logging to : ArchRLogs/ArchR-addImputeWeights-30b5c22f0f88a9-Date-2022-11-10_Time-08-30-13.log
If there is an issue, please report to github with logFile!
2022-11-10 08:30:13 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed.
> p <- plotEmbedding(
+     ArchRProj = projHeme5,
+     colorBy = "GeneScoreMatrix",
+     name = "CD3D",
+     embedding = "UMAP",
+     quantCut = c(0.01, 0.95),
+     imputeWeights = getImputeWeights(projHeme5)
+ )
Getting ImputeWeights
ArchR logging to : ArchRLogs/ArchR-plotEmbedding-30b5c275b29af3-Date-2022-11-10_Time-08-30-33.log
If there is an issue, please report to github with logFile!
2022-11-10 08:30:34 : Getting Matrix Values...

Imputing Matrix
Plotting Embedding
1 
ArchR logging successful to : ArchRLogs/ArchR-plotEmbedding-30b5c275b29af3-Date-2022-11-10_Time-08-30-33.log
> p

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.5 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] uwot_0.1.14                 hexbin_1.28.2               rhdf5_2.38.1                SummarizedExperiment_1.24.0
 [5] RcppArmadillo_0.11.2.0.0    Rcpp_1.0.9                  GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
 [9] sparseMatrixStats_1.6.0     stringr_1.4.0               plyr_1.8.7                  ggplot2_3.3.6              
[13] gtable_0.3.0                gtools_3.9.3                gridExtra_2.3               devtools_2.4.4             
[17] usethis_2.1.6               ArchR_1.0.3                 magrittr_2.0.1              Matrix_1.3-4               
[21] data.table_1.14.2           Biobase_2.54.0              IRanges_2.28.0              S4Vectors_0.32.3           
[25] BiocGenerics_0.40.0         MatrixGenerics_1.6.0        matrixStats_0.61.0         

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               doParallel_1.0.17      RColorBrewer_1.1-3     tools_4.1.1           
 [6] profvis_0.3.7          utf8_1.2.2             R6_2.5.1               DBI_1.1.3              colorspace_2.0-3      
[11] rhdf5filters_1.6.0     GetoptLong_1.0.5       withr_2.5.0            urlchecker_1.0.1       tidyselect_1.1.2      
[16] prettyunits_1.1.1      processx_3.7.0         compiler_4.1.1         cli_3.3.0              Cairo_1.6-0           
[21] DelayedArray_0.20.0    labeling_0.4.2         scales_1.2.0           callr_3.7.1            digest_0.6.29         
[26] XVector_0.34.0         pkgconfig_2.0.3        htmltools_0.5.3        sessioninfo_1.2.2      fastmap_1.1.0         
[31] htmlwidgets_1.5.4      rlang_1.0.4            GlobalOptions_0.1.2    rstudioapi_0.13        shiny_1.7.2           
[36] farver_2.1.1           shape_1.4.6            generics_0.1.3         dplyr_1.0.9            RCurl_1.98-1.8        
[41] GenomeInfoDbData_1.2.7 Rhdf5lib_1.16.0        munsell_0.5.0          fansi_1.0.3            lifecycle_1.0.1       
[46] stringi_1.7.8          zlibbioc_1.40.0        pkgbuild_1.3.1         promises_1.2.0.1       crayon_1.5.1          
[51] miniUI_0.1.1.1         lattice_0.20-44        circlize_0.4.15        ComplexHeatmap_2.10.0  ps_1.7.1              
[56] pillar_1.8.0           rjson_0.2.21           codetools_0.2-18       pkgload_1.3.0          glue_1.6.2            
[61] remotes_2.4.2          renv_0.14.0            BiocManager_1.30.18    png_0.1-7              vctrs_0.4.1           
[66] httpuv_1.6.5           foreach_1.5.2          purrr_0.3.4            assertthat_0.2.1       clue_0.3-61           
[71] cachem_1.0.6           mime_0.12              xtable_1.8-4           later_1.3.0            tibble_3.1.8          
[76] iterators_1.0.14       memoise_2.0.1          cluster_2.1.3          ellipsis_0.3.2        

image

rcorces commented 1 year ago

Maybe its a ggplot2 issue. Could you downgrade to ggplot2_3.3.6 and see if that changes anything?

rcorces commented 1 year ago

Nevermind - I upgraded to ggplot2_3.4.0 and that is not the issue

Ping-lin14 commented 1 year ago

What information can I provide?

Ping-lin14 commented 1 year ago

The GeneScoreMatrix seems to be ok.

geneintegration <- getMatrixFromProject(proj, useMatrix="GeneScoreMatrix")
ArchR logging to : ArchRLogs/ArchR-getMatrixFromProject-63e426becf4fa-Date-2022-11-13_Time-20-36-25.log
If there is an issue, please report to github with logFile!
2022-11-13 20:36:45 : Organizing colData, 0.337 mins elapsed.
2022-11-13 20:36:45 : Organizing rowData, 0.338 mins elapsed.
2022-11-13 20:36:45 : Organizing rowRanges, 0.338 mins elapsed.
2022-11-13 20:36:45 : Organizing Assays (1 of 1), 0.338 mins elapsed.
2022-11-13 20:36:48 : Constructing SummarizedExperiment, 0.387 mins elapsed.
2022-11-13 20:36:49 : Finished Matrix Creation, 0.403 mins elapsed.
> geneintedata <- assay(geneintegration)
> genedata <- rowData(geneintegration)
> geneintedata <- as.matrix(geneintedata)
> geneinteinfo <- cbind(genedata, geneintedata)
> geneinteinfo2 <- as.data.frame(t(as.data.frame(geneinteinfo[,7:length(colnames(geneinteinfo))])))
> colnames(geneinteinfo2) <- genedata$name
> rownames(geneinteinfo2) <- colnames(geneinteinfo)[7:length(colnames(geneinteinfo))]
> geneinteinfo2$CD3D
   [1]  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.412  0.000  0.000  0.000  0.384  0.000  0.000  0.000  0.000  0.000  0.000  0.000
  [21]  0.000  0.000  0.000  0.000  0.471  0.000  0.442  0.000  0.530  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.496  0.000
  [41]  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.666  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.572  0.000  0.000
  [61]  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.533  0.000  0.000  0.000  0.000  0.000  0.575  0.000  0.000  0.599  0.000  0.000
  [81]  0.000  0.000  0.564  0.000  0.000  0.000  0.000  2.070  0.903  0.000  0.589  0.000  0.000  0.000  3.425  0.000  0.554  0.000  0.000  0.588
 [101]  0.000  2.273  0.000  0.000  0.000  0.000  0.621  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.586  0.000  0.000
 [121]  0.000  1.909  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.634  0.000  0.725  2.325  0.000  0.000  0.000  0.000  0.000  0.000
 [141]  0.000  0.000  0.000  0.561  0.000  0.000  0.000  0.000  0.000  0.000  2.093  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
 [161]  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.630  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.741
 [181]  2.582  0.000  0.000  0.596  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  2.932  0.000  0.000
 [201]  0.000  0.730  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.727  0.000  0.000  0.000  0.000  0.000  0.000
 [221]  0.640  0.000  0.000  0.725  0.000  0.000  0.000  0.000  0.000  0.681  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000
 [241]  0.683  0.000  0.000  0.735  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  2.927  0.765  0.000  0.000  2.491
 [261]  0.000  0.000  0.000  0.000  0.000  0.763  0.000  0.783  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  3.785  0.000  0.000
rcorces commented 1 year ago

I guess things that would be helpful to know are:

How far does this problem extend?

  1. What does plotEmbedding() look like when you just plot info from cellColData like Clusters or Samples?
  2. Does the problem occur with other matrices?
  3. Does the problem occur with and without imputation?
  4. Does the problem occur with and without rastr = TRUE in plotEmbedding()?
  5. Does the problem occur when you specify threads = 1 in plotEmbdding()?
GrafZahl1234 commented 1 year ago

I do have the same problem. It shows up whenever I visualize marker genes from GeneScoreMatrix or GeneIntegrationMatrix. I see it in both the W and WO imputation plot. It is not solved by threads = 1,rastr = TRUE. I am working along the manual, but I use a different dataset. Other plots, for example of clusters, samples and predictions are fine. Hope it helps.

Plot-UMAP-RNA-Integration_markers.pdf ArchR-plotEmbedding-1442661a3dcf-Date-2022-11-14_Time-18-35-15.log

rcorces commented 1 year ago

Well. This certainly seems concerning given how many people are now encountering this issue.

Have you all updated packages recently? Some of you are using ArchR 1.0.2 while others are on dev/1.0.3 so it doesnt strike me as a problem related to a change in ArchR's codebase but rather a change in one of the many dependencies.

One place to start would be to run the following command on the tutorial data and send me the log file and upload an RDS of the p plot object (GitHub might try to stop you but if you change the suffix to .rds.txt it should let you). Perhaps I'll be able to glean something.

First, install the dev branch of ArchR

devtools::install_github("GreenleafLab/ArchR", ref="dev", repos = BiocManager::repositories())
#to unload a package and reload
detach("package:ArchR", unload=TRUE)
library(ArchR)

Then run

proj <- getTestProject(version = 2)

p <- plotEmbedding(
   ArchRProj = proj,
   colorBy = "GeneScoreMatrix",
   name = "CD3D",
   embedding = "UMAP",
   quantCut = c(0.01, 0.95),
   imputeWeights = NULL
)
GrafZahl1234 commented 1 year ago

Sadly this leads to the next problem, first of all my "getTestProject" function does not accept an argument. Second, when I remove the argument, I get the Issue described here: https://github.com/GreenleafLab/ArchR/issues/1643.

rcorces commented 1 year ago

@GrafZahl1234 - Sorry, I should have realized that the getTestProject() function was recently updated on dev to include multiple versions.

Separate, the error mentioned in #1643 could be related to the problem at hand but @Ping-lin14 is not running the most up to date release of bioconductor so I dont know.

Ping-lin14 commented 1 year ago

In my case getTestProject() is usable. Plot-CD3D-umap.rds.txt ArchR-plotEmbedding-ce6059814d92-Date-2022-11-14_Time-21-04-04.log

On the other hand, I solved this problem by overwriting ArchR 1.0.3 in the old environment. This indicates that it is indeed caused by some dependencies. But I can't confirm which dependency is causing it.

rcorces commented 1 year ago

On the other hand, I solved this problem by overwriting ArchR 1.0.3 in the old environment. This indicates that it is indeed caused by some dependencies. But I can't confirm which dependency is causing it.

Thats great! Thanks for sharing. I wonder if other users on this thread would have similar success in forcing a fresh install of ArchR. @Ping-lin14 - when you updated, did you also update all other out-of-date packages?

ammaralsheik commented 1 year ago

Been having similar issue with plotEmbedding() visualizations of marker genes, issue happens on the tutorial dataset W and WO imputation. I'm on the stable ArchR v1.0.2. As you can see there isn't any issue with prior steps of plotting embeddings. ArchR-plotEmbedding-87e77175d16e2-Date-2022-11-15_Time-13-46-41.log Screen Shot 2022-11-15 at 9 45 49 AM Screen Shot 2022-11-15 at 9 34 14 AM

Ping-lin14 commented 1 year ago

@rcorces Yes, I updated all old packages as suggested by R. I tried to reproduce the updated packages suggested by R. There are as many as 100 in my environment. Is it necessary to provide it to you? Although I think the number is too large, it is difficult to narrow the scope.

mehc555 commented 1 year ago

I've updated to the dev version and updated the recommended packages, however still having the same issue

rcorces commented 1 year ago

This one is really blowing my mind with how many people seem to be having issues across different versions of R, Bioconductor, and ArchR. I'm going to try to run more tests on my end today. Apologies that this isnt more straightforward.

For those of you still having this issue, it would be helpful to know if plotEmbedding() gave the correct results in the past but is now not working properly.

ammaralsheik commented 1 year ago

@rcorces thank you for working to help with this issue. For my case i've not worked with the package before so I can't comment if it would have worked on earlier version. However; currently plotEmbedding() works well only in some situations (UMAP plots of clusters/samples) as I showed above but not in others (showing genescorematrix).

rcorces commented 1 year ago

Ok I am able to reproduce this error now. I created a clean environment, using the most updated package versions available for all ArchR dependencies. I then loaded an old ArchRProject that was created using an old environment that didnt have this issue. Plotting the old project in the new environment had issues similar to those observed above, though my plots dont look identical which is surprising. This seems to indicate to me that this is an issue related to plotting, rather than something more fundamental related to building the gene score matrix etc. Still unsure what might be causing this but I will keep digging.

rcorces commented 1 year ago

Getting closer. The problem on my end seems to be relegated to hex plots (plotAs = "hex", which is the default). If I change to use plotAs = "points", I do not have a problem. For anyone looking for an immediate solution, please let me know if that works on your end. I will keep digging to figure out the underlying issue.

plotAs = "hex" or plotAs = NULL image

plotAs = "points" image

ammaralsheik commented 1 year ago

Thank you @rcorces for this, it fixed the issue on my end as you showed. Now that you mention hex, it might be related to "hexbin" library since i got an error and had to manually install the library when I reached the first plotting of genescorematrix in section 7.4 of the tutorial.

rcorces commented 1 year ago

Despite my previous testing, it turns out that this is a problem with ggplot2. I also checked hexbin and confirmed that is not the issue.

Here is a minimal reproducible example:

download this file - https://www.dropbox.com/s/gde43v0jzsb9wvu/221115_plotEmbedding_plotParams.rds?dl=0

then run:

plotParamsx <- readRDS(file = "~/temp/221115_plotEmbedding_plotParams.rds")

gg <- do.call(ggHex, plotParamsx)

gg <- gg + theme(axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank())

gg

This will give the incorrect plot on ggplot2_3.4.0 and give the correct plot on the previous version ggplot2_3.3.6.

For now, I would recommend avoiding ggplot2_3.4.0 until this is sorted out. You can install ggplot2_3.3.6 via:

devtools::install_version("ggplot2", version = "3.3.6", repos = "http://cran.us.r-project.org")
rcorces commented 1 year ago

The developers of ggplot2 are already aware of this bug and have put a patch in place. More here - https://github.com/tidyverse/ggplot2/pull/5045

Closing this as resolved. Just avoid ggplot2_3.4.0

badoi commented 1 year ago

Very nicely done. We had the same result 3 ppl w/ same version ArchR v1.0.2 and the person w/ ggplot2_3.3.6 was the only one who was able to make the not streaky plots. Others had ggplot2_3.4.0. Thanks for the sleuthing!

alekseybelikov commented 5 days ago

The problem is that ggplot 3.4.0 is the exact version that is required for everything else to work, see https://github.com/GreenleafLab/ArchR/issues/2130#issuecomment-2231419238

Now I have the same problem

Screenshot 2024-10-18 at 17 47 31