plotBrowserTrack inconsistent coverage profiles

GreenleafLab / ArchR

ArchR : Analysis of Regulatory Chromatin in R (www.ArchRProject.com)

MIT License

388 stars 140 forks source link

plotBrowserTrack inconsistent coverage profiles #1555

Closed Brawni closed 2 years ago

Brawni commented 2 years ago

Hello!

I think I remember this being reported before but i cant find it anymore. I find that running plotBrowserTrack multiple times with same exact data, region and parameters gives me back different coverage profiles. No errors are thrown. Here an example:

ex_cov <- plotBrowserTrack(
    ArchRProj = archp, 
    sizes = c(10, 1.5, 5, 4),
    useGroups = c('TME - Tregs','Tregs'),
    groupBy = metaGroupName, 
    geneSymbol = gene, 
    upstream = 10000,
    downstream = 100000,
    loops = getCoAccessibility (archp, corCutOff = 0.3,
      returnLoops = TRUE),
    pal = paletteDiscrete(unique(archp@cellColData[,metaGroupName]), set='rushmore', reverse=T)
)
plotPDF (ex_cov, ArchRProj = archp, name = paste0(gene,'L_coverage'), width=5, height=2,addDOC = F)

Running on ArchR v1.0.2

rcorces commented 2 years ago

Hi @Brawni! Thanks for using ArchR! Please make sure that your post belongs in the Issues section. Only bugs and error reports belong in the Issues section. Usage questions and feature requests should be posted in the Discussions section, not in Issues.
Before we help you, you must respond to the following questions unless your original post already contained this information: 1. If you've encountered an error, have you already searched previous Issues to make sure that this hasn't already been solved? 2. Can you recapitulate your error using the tutorial code and dataset? If so, provide a reproducible example. 3. Did you post your log file? If not, add it now. 4. Remove any screenshots that contain text and instead copy and paste the text using markdown's codeblock syntax (three consecutive backticks). You can do this by editing your original post.

Brawni commented 2 years ago

log files: ArchR-plotBrowserTrack-38abd16a068cd-Date-2022-08-11_Time-15-48-32.log ArchR-plotBrowserTrack-38abd386ac586-Date-2022-08-11_Time-15-47-57.log ArchR-plotBrowserTrack-38abd88a16d5-Date-2022-08-11_Time-15-47-27.log ArchR-plotBrowserTrack-38abd8c1fb2e-Date-2022-08-11_Time-15-47-02.log

rcorces commented 2 years ago

https://github.com/GreenleafLab/ArchR/issues/626

set a seed with set.seed() prior to plotting.

Brawni commented 2 years ago

I see. Sorry I missed that! Still chromatin profiles there are quite different from each other, why is the case? is there a subsampling step before generating these pseudo-bulks? How else can you get different profiles if you use same number of cells?

rcorces commented 2 years ago

There is an argument maxCells to the non-exported function .groupRegionSumArrows() which limits the total number of cells used for creating the profile. We do not allow users to change this parameter and it is effectively hard-coded at 500. So the randomness is introduced when subsampling cells. I'm open to allowing users to specify this parameter but I dont think it will make a difference in the appearance of the plot and it will just slow things down.

https://github.com/GreenleafLab/ArchR/blob/f6c0388bd37023400794c9ae8562ad69e3ba9fd7/R/ArchRBrowser.R#L1056-L1068

Brawni commented 2 years ago

I see. I think it wouldn't hurt to have it in the frontend function just in case someone wants to use all cells from groups, cause in some instances, like this one, 500 cells dont seem to be representative enough for the groups.

rcorces commented 2 years ago

ok. This is now available on dev and will be incorporated into release_1.0.3

Brawni commented 2 years ago

Awesome thanks!