Bioconductor / Contributions

Contribute Packages to Bioconductor
133 stars 33 forks source link

SCOPE #1242

Closed rujinwang closed 4 years ago

rujinwang commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @rujinwang

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SCOPE
Type: Package
Title: A normalization and copy number estimation method for single-cell DNA sequencing
Version: 0.99.0
Author: Rujin Wang, Danyu Lin, Yuchaojiang
Maintainer: Rujin Wang <rujin@email.unc.edu>
Description: Whole genome single-cell DNA sequencing (scDNA-seq) enables characterization of copy number profiles at the cellular level. This circumvents the averaging effects associated with bulk-tissue sequencing and has increased resolution yet decreased ambiguity in deconvolving cancer subclones and elucidating cancer evolutionary history. ScDNA-seq data is, however, sparse, noisy, and highly variable even within a homogeneous cell population, due to the biases and artifacts that are introduced during the library preparation and sequencing procedure. Here, we propose SCOPE, a normalization and copy number estimation method for scDNA-seq data. The distinguishing features of SCOPE include: (i) utilization of cell-specific Gini coefficients for quality controls and for identification of normal/diploid cells, which are further used as negative control samples in a Poisson latent factor model for normalization; (ii) modeling of GC content bias using an expectation-maximization algorithm embedded in the Poisson generalized linear models, which accounts for the different copy number states along the genome; (iii) a cross-sample iterative segmentation procedure to identify breakpoints that are shared across cells from the same genetic background. We evaluate performance of SCOPE on real scDNA-seq data sets from cancer genomic studies. Compared to existing methods, SCOPE more accurately estimates subclonal copy number aberrations and is shown to have higher correlation with array-based copy number profiles of purified bulk samples from the same patient. We further demonstrate SCOPE on three recently released data sets using the 10X Genomics single-cell CNV pipeline and show that it can reliably recover 1% of the cancer cells from a background of normal.
Depends: R (>= 3.6.0), GenomicRanges, IRanges, Rsamtools, BSgenome.Hsapiens.UCSC.hg19, GenomeInfoDb
Imports: stats, grDevices, graphics, utils, DescTools, RColorBrewer, gplots, foreach, parallel, doParallel, DNAcopy, BSgenome, Biostrings, BiocGenerics
Suggests:
    knitr,
    rmarkdown,
    WGSmapp,
    testthat (>= 2.1.0)
VignetteBuilder: knitr
biocViews: SingleCell, 
    Normalization, 
    CopyNumberVariation, 
    Sequencing, WholeGenome, 
    Coverage, 
    Alignment, 
    QualityControl, 
    DataImport
License: GPL-2
LazyData: true
RoxygenNote: 6.1.1
Encoding: UTF-8
rujinwang commented 5 years ago

AdditionalPackage: https://github.com/rujinwang/WGSmapp

bioc-issue-bot commented 5 years ago

Can't build unless issue is open and '2. review in progress' label is present, or issue is closed and 'TESTING' label is present.

bioc-issue-bot commented 5 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

rujinwang commented 5 years ago

AdditionalPackage: https://github.com/rujinwang/WGSmapp

bioc-issue-bot commented 5 years ago

Dear @rujinwang ,

You (or someone) has already posted that repository to our tracker.

See https://github.com/Bioconductor/Contributions/issues/1239

You cannot post the same repository more than once.

If you would like this repository to be linked to issue number: 1242, Please contact a Bioconductor Core Member.

rujinwang commented 5 years ago

Dear Bioconductor Core Member @hpages , I would like this repository to be linked to issue number #1239. The software package SCOPE has the companion experiment data package WGSmapp used for illustrative purposes in the vignette. Otherwise the build of SCOPE will get ERRORs. Could you please help me on this? Thanks.

Rujin

hpages commented 5 years ago

Hi Rujin,

Thanks for your submission.

You don't need to open a new issue for companion data packages so please close issue #1239.

All you need to do is add the companion package as an additional package to the issue you already opened for the software package (i.e. this issue). Which is what you did 2 days ago (see https://github.com/Bioconductor/Contributions/issues/1242#issuecomment-529566331).

Looks like our Single Package Builder (SPB) failed to install WGSmapp before trying to run R CMD build on SCOPE. Not sure why. Maybe @lshep can chime in. But before that, please close issue #1239 and bump SCOPE version (to 0.99.1) to trigger a new build by the SPB and we'll see what happens.

Thanks, H.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

b129572 bump to version 0.99.2

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

rujinwang commented 5 years ago

Hi Herve @hpages , Thanks so much for your reply. I just closed issue #1239 and triggered a new build. It seems the companion data package WGSmapp fails to be built before running SCOPE. In the previoulsy opened issue #1239 , WGSmapp can be built without any errors. Here I directly use library(WGSmapp) in the vignette of SCOPE. Do I need to use devtools::install_github("rujinwang/WGSmapp") instead?

best, rujin

hpages commented 5 years ago

Do I need to use devtools::install_github("rujinwang/WGSmapp") instead?

No because it's considered bad practice to install packages in a vignette. And having a vignette that installs stuff from GitHub is even worse.

I guess it's time to ask help from @lshep . Lori?

Thanks!

hpages commented 5 years ago

Lori (@lshep) contacted me privately to let me know that she'll take care of this tomorrow. Thanks for your patience.

rujinwang commented 5 years ago

Sounds good. Thank you for your help!

lshep commented 5 years ago

I've kicked off a manual build of this package and I believe corrected it in our database to be associated with this issue number. Now as long as the webhook is set up on BOTH repositories - either one should trigger a build on version bump and report the status here. Note: it will only build for the package that the version bump occurred not both package. This should also allow both packages to find each other. If there are any further issues with this please let me know - otherwise I'll leave it back to @hpages to do a formal review of your packages when the ERRORS are cleared.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

ddf5a61 bump to version 0.99.3

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

rujinwang commented 5 years ago

Yes, ERRORs are all cleared now! Thank you @Ishep and @hpages .

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

96d59b4 add .bed file for hg38

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

43af6bd add seg regions for hg38

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

aa7dec6 incorporate hg38gaps.txt

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

1afef1f Enable user-define bin length and offer SoSplot

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

hpages commented 4 years ago

Hi @rujinwang ,

I see you've been actively working on SCOPE and the package now passes R CMD check and R CMD BiocCheck without errors or warnings. Thanks for that. Please let me know if the package is now ready for review.

Best, H.

rujinwang commented 4 years ago

Hi @hpages , Yes, please go ahead to review the package. Thanks!

Best, Rujin

hpages commented 4 years ago

Hi Rujin,

I've taken a first look at the package and it has usability and reliability issues that are too serious for inclusion in Bioconductor. I bumped into these issues at the very beginning of your workflow, only after using the first two functions used in the workflow (getbambed_scope and getmapp). I could keep going e.g. getgc_scope has issues similar to getmapp and so on... (and why would one function be suffixed with _scope and not the other one?) but I'll stop here because my time is limited. If you want to pursue this submission, a lot of work will need to be done on the package to make it more usable and more reliable.

Regards, H.

Some examples of usability issues

Some examples of reliability issues

Some other minor issues

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

91f1834 fix NAs in seqlengths(mapp_hg38)

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

35905ff update R version Dependency to 3.6

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

bdcbc90 version bump 0.99.6

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

rujinwang commented 4 years ago

Hi @hpages , Thanks a lot for your review and constructive suggestions. I've made changes accordingly and bumped the package to a new version. Regarding some of your concerns, please see my responses below.

Some examples of usability issues

The "1.2 Bioinformatic pre-processing" section of the vignette shows the use of command-line tools java -jar picard.jar and split_script.py. Are these tools available somewhere? Are they documented? What do they do exactly? If the user doesn't have access to them, what is the point of this section?

The picard.jar is a public available tool for manipulating high-throughput sequencing data, released by Broad Institute. The only self-developed script is split_script.py and now it is pre-stored in the package (inst/docs folder). The aim of this section is to show how the pre-processing bioinformatic pipeline works for users to get preparation of the input for SCOPE (demultiplexed .bam files).

These pre-stored mappability tracks are downloaded from the ENCODE Project thus can not be directly passed to a function. I'm now passing a simple character indicator, instead of a huge BSgenome object, to allow the function to know which mappability track to pick up. Do you have any comments or suggestions on how to do it in a more efficient way?

Some examples of reliability issues

Do you realize that the BAM files you use in the "2.1 Pre-preparation" section of the vignette contain alignments against genome assembly hg38

How did you realize these BAM files are against genome assembly hg38? I downloaded them from 10XGenomic website, which is public available. Given the information provided here, it's against assembly GRCh37/hg19.

This result is incorrect:

getmapp(GRanges("chr1:10022-10031"), BSgenome.Hsapiens.UCSC.hg19)
# Getting mappability for chr1
# [1] 0.65625

Should be 0.75 (half of the range has a mappability score of 0.5 and the other half a mappability score of 1).

The mappability is computed as weighted average of the mappability scores if multiple ENCODE regions overlap with the query genome region. It's not weighted by the width of the query genome region, but rather by the width of mappability track bins. In other word, ranges that only hit regions with mappability scores of 0.5 and 1 will have the same mappability scores, whatever the width of each sub-region is. For example, in your scenario, mappability = (0.511+15)/(11+5) = 0.65625.

Some other minor issues

Yes, we are pursuing the submission. Please feel free to let me know if further modifications or improvements are needed. Thank you for your help!

Best, Rujin

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 4 years ago

Please don't be alarmed about all the build report. And sorry for the noise. We are updating the builders to R 4.0 and Bioc 3.11 - Your package has some special cases that are triggering ERRORs on our end in windows - Unfortunately we don't have a good test case to run separately besides kicking off new builds of your package - Again we apologize for the noise and hope to have it remedied soon.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

hpages commented 4 years ago

Hi Rujin,

Thanks for the improvements to the package.

Previously discussed

mapp_hg38 or mapp_hg19 are large-size GRanges Objects pre-stored in the dependent dataset package WGSmapp, rather than variables defined in the global environment.

But when you do data("mapp_hg38") you load the mapp_hg38 dataset in the global environment. So now you have a variable mapp_hg38 defined in the global environment. And your get_mapp() function expects this variable to be there. If the variable is not there, get_mapp() won't work (it will fail). This is fragile. This is why having a function defined in your package referencing variables defined in the global environment is considered bad software design. In other words, get_mapp() should always be able to find the mapp_hg38 dataset (whether the user previously did data("mapp_hg38") or not).

These pre-stored mappability tracks are downloaded from the ENCODE Project thus can not be directly passed to a function.

Why can't they? You've saved them as GRanges objects. Why couldn't you pass a GRanges object to a function? E.g.:

data("mapp_hg38")
get_mapp(ref, mapp_hg38)

As easy as that!

How did you realize these BAM files are against genome assembly hg38?

I looked at the chromosome lengths:

> seqinfo(BamFile(bambedObj$bamdir[[1]]))
Seqinfo object with 196 sequences from an unspecified genome:
  seqnames         seqlengths isCircular genome
  chr1              248956422       <NA>   <NA>
  chr2              242193529       <NA>   <NA>
  chr3              198295559       <NA>   <NA>
  chr4              190214555       <NA>   <NA>
  chr5              181538259       <NA>   <NA>
  ...                     ...        ...    ...
  chrUn_KI270742v1     186739       <NA>   <NA>
  chrUn_GL000216v2     176608       <NA>   <NA>
  chrUn_GL000218v1     161147       <NA>   <NA>
  chrEBV               171823       <NA>   <NA>
  hs38d1             10560522       <NA>   <NA>

Those are the chromosome lengths of hg38 (see seqinfo(getBSgenome("hg38"))), not hg19.

It's not weighted by the width of the query genome region, but rather by the width of mappability track bins.

So if you have a range of 1 million bases where only the 1st position overlaps with a bin of mappability 0.5 and the 999999 other positions overlap with a bin of mappability 1, and the two bins have the same size, you're considering that the mappability of your range is 0.75? Even though 99.9999% of the range has a mappability of 1? How does that make sense?

Other things

Best, H.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

b18823b use lazy-loading

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.