SARC - Githubissues

Krutik6 commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/Krutik6/SARC

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @Krutik6

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SARC
Type: Package
Title: Statistical Analysis of Regions with CNVs
Version: 0.99.0
Authors@R: person(given = "Krutik", family = "Patel", role = c("aut", "cre"), email = "nkp68@newcastle.ac.uk", comment=c(ORCID="0000-0001-6806-8675"))
Description: Imports .cov files and .bed files from WES/ WGS CNV (copy number variation) detection pipelines and utilises several metrics to weigh the likelihood of a sample containing a detected CNV being a true CNV or a false positive. Highly useful for diagnostic testing to filter out false positives to provide clinicians with fewer variants to interpret. SARC uniquely only used COV and BED files which are the most common CNV pipeline calling filetypes, and can be used as an alternative to the Interactive Genome Browser (IGV), which can be especially helpful in large cohorts with 100s-1000s of patients.
Encoding: UTF-8
RoxygenNote: 7.2.3
License: GPL-3
Imports: tidyverse,
         utils,
         reshape2,
         DescTools,
         metap,
         multtest,
         plyranges,
         data.table,
         scales,
         RColorBrewer,
         grid,
         gtable,
         gridExtra,
         GenomicFeatures,
         stats,
         ggplot2,
         plotly,
         IRanges,
         GenomicRanges
Depends: R (>= 4.3),
         MultiAssayExperiment
Suggests: TxDb.Hsapiens.UCSC.hg38.knownGene,
Homo.sapiens,
knitr,
kableExtra,
testthat
biocViews: Software,
 CopyNumberVariation,
 Visualization,
 DNASeq,
 Sequencing
VignetteBuilder: knitr
URL: https://github.com/Krutik6/SARC/
BugReports: https://github.com/Krutik6/SARC/issues

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

DarioS commented 1 year ago

Overall, the package implementation is good but there are some issues which need to be addressed.

Please choose a more informative title for the vignette than "SARC_guide". See DESeq2 and batchelor for examples.

There are duplicate rows in test data file.

> test_bed  # Why is second and third row exactly the same?
  SAMPLE CHROM     START       END TYPE VALUE       TOOL
1 Sample_616  chr1 161487763 161565521  DUP     1 exomeDepth
2 Sample_617  chr1  17034387  17090909  DEL     0 exomeDepth
3 Sample_617  chr1  17034387  17090909  DEL     0 exomeDepth

The example BED files do not conform to the BED file standard. Please do not redefine widely used bioinformatics file formats. Also, the data directory should contain .bed files, not .rda files of data frames claiming to be BED format.
The package only works with copy number variants. Why is MultiAssayExperiment used as the main data container?

> SARC <- regionSet(bed = test_bed, test_cov)
> SARC
A MultiAssayExperiment object of 0 listed
 experiments with no user-defined names and respective classes.
 Containing an ExperimentList class object of length 0:

Please redesign it to use RaggedExperiment instead.

This package provides a flexible representation of copy number, mutation, and other data that fit into the ragged array schema for genomic location data.

Integration with existing Bioconductor functionality needs improvement. Don't use external utilities if not necessary.

The cov file should comprise of coverage from BAM files. These can be retrieved using SAMTOOLS or BCFTOOLs.

Please replace this by runnable code using coverage function from GenomicAlignments applied to a BAM file in data folder.

The package advocates for comparing read coverage between different samples directly. This is not sensible. What if sample A has 50 million reads in total but sample B has 200 million reads in total? Read coverage is not a valid indicator of copy number unless it is scaled by the library size.
Function examples should not use library(packageName) but check for presence of package installed. Please see Suggested Packages instructions for how to best load a package in an Examples section.
It is not clear if test data 1 and 2 are simulated data or experimental biological data. Please state the source of the data.
Unit tests are inadequate. For example, test_stats.R tests the number of columns and column names but not whether the expected genes are significantly different or not. Please test that the calculations of your algorithm are correct.
```
expect_equal(ncol(x), 13)
expect_equal(names(x)[12], "CNV.SCORE")
expect_equal(names(x)[13], "CNV.TIER")
```
All of the examples use data packages such as TxDb.Hsapiens.UCSC.hg38.knownGene. Please comment on the applicability of SARC to other species such as Mus musculus. The generalisability of it should be explained in the vignette.

Krutik6 commented 1 year ago

Cheers @DarioS. Will aim to address each point within 2-weeks. Krutik

bioc-issue-bot commented 1 year ago

Please remember to push to git.bioconductor.org to trigger a new build

bioc-issue-bot commented 1 year ago

Please remember to push to git.bioconductor.org to trigger a new build

bioc-issue-bot commented 1 year ago

Please remember to push to git.bioconductor.org to trigger a new build

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d62045379755eabe7fc3aade6a06a3bcfee3d3cc

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c9074531cf2f057ee8df531a6b06b44aa7c7219c

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: SARC_0.99.21.tar.gz Linux (Ubuntu 22.04.2 LTS): SARC_0.99.21.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): SARC_0.99.22.tar.gz macOS 12.6.5 Monterey: SARC_0.99.22.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

Krutik6 commented 1 year ago

Dear @DarioS ,

Thank you for your comments, and apologies in the delay in resubmission - had github issues (thanks @lshep for helping with them). I believe this tool has been improved for the better of the users and the research community. I have listed the suggestions along with a response. Any minor additional changes have bee listed in the end of this response.

Please choose a more informative title for the vignette than "SARC_guide". See DESeq2 and batchelor for examples. Yes, this would be for the best. The title has been renamed to "Tutorial of the SARC R Package. Flagging the confidence of CNVs from WES/ WGS cohorts."
There are duplicate rows in test data file. Sorry this was a mistake - third row was from the tool clinCNV. This has been corrected for test_cnv and test_cnv2.
The example BED files do not conform to the BED file standard. Please do not redefine widely used bioinformatics file formats. Also, the data directory should contain .bed files, not .rda files of data frames claiming to be BED format. You are correct - this would have just caused confusion. Instead, I have renamed these to cnv files. I have mentioned that they are inspired by, and perform the same function as CNV .BED files, but are a different entity. I thought it would be simpler to create a specific class of file, rather than continue with claiming I had used .bed files - when I hadn't. Furthermore, I did prefer to keep .rda files because they could be compressed and were simple to call with data(x), so this was an all-round better solution I think.
The package only works with copy number variants. Why is MultiAssayExperiment used as the main data container? Please redesign it to use RaggedExperiment instead. I had used MAE because I was used to them - I have completely changed to RaggedExperiments now. This has also been explained in the vignette. I have altered the DESCRIPTION file to be dependant on RaggedExperiment instead and removed any mention of MultiAssayExperiments.
Please replace this by runnable code using coverage function from GenomicAlignments applied to a BAM file in data folder. I have added a small .bam and bam.bai file in the /inst/extdata folder and provided a small indication of how this could be used in the vignette. I did not wish to overburden the SARC vignette with another tool - so I have added the tutorials of GenomicAlignments. I have also added a link for bamsignals as I thought this was another good Bioconductor tool to perform a similair job.
The package advocates for comparing read coverage between different samples directly. This is not sensible. What if sample A has 50 million reads in total but sample B has 200 million reads in total? Read coverage is not a valid indicator of copy number unless it is scaled by the library size. I agree - at first I avoided normalisation as I thought this would skew very small variants, but after running some comparisons, this was a very useful comment. I have given an example in the vignette on how to normalise to RPM also, and mention several times that the data has been normalsed - and removed any mention of raw coverage.
Function examples should not use library(packageName) but check for presence of package installed. Please see Suggested Packages instructions for how to best load a package in an Examples section. I have altered this in the examples and the vignettes to instead use require or if (requireNamespace("X", quietly = TRUE)) {}
It is not clear if test data 1 and 2 are simulated data or experimental biological data. Please state the source of the data. Sorry - this was clinical data from our cohort. I have further anonymised this data and added some citations.
Unit tests are inadequate. For example, test_stats.R tests the number of columns and column names but not whether the expected genes are significantly different or not. Please test that the calculations of your algorithm are correct. Appologies - I have added further unit tests to check the maths and functions of the statistical functions.
All of the examples use data packages such as TxDb.Hsapiens.UCSC.hg38.knownGene. Please comment on the applicability of SARC to other species such as Mus musculus. The generalisability of it should be explained in the vignette. This is fair enough. I had intended this to be a clinical tool, but if it can help researchers working with model organisms, then I am happy to promote this. I have added a section at the end of the vignette to do just this.

Minor updates New csv dataframes will be stored as nested lists within the RaggedExperiment object. To avoid removing older dataframes, users will have the option to rename the new dataframes that are being nested. Added new parameter to fix a plotting bug where the log flag not having a logbase as a parameter.

Thank you for your comments, again this tool has been improved for the better I think. Please communicate any other issues - or indadequate changes and I will try to respond promptly.

DarioS commented 1 year ago

Thank you for the significant updates to SARC. I have a few minor remaining issues to raise.

Don't use require in the vignette for code users will run, but do use library. require() is usually only used if you are loading packages inside a function so that the function will continue to execute after warning, if a package doesn’t exist.
RStudio automatically does spell-check and underlines potential mistakes. Please fix whichever underlining is relevant.
x <- coverage(x = "../inst/extdata/test.bam") is error unless user's working directory is the package directory. Please do it more robustly, such as system.file("extdata", "test.bam", package="SARC")

The paths elsewhere also need updating for robustness.
```
load(file ="../inst/extdata/test_cov_raw.rda")
load(file ="../inst/extdata/librarySizes.rda")
```
The for loop in the vignette is unnecessary and can be done with vectorisation. Please verify that
```
t(t(test_cov[, 5:ncol(test_cov)]) / libsizes$PerMillion)
```
gives the same result as the for loop, but does it more concisely and is vectorised.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6749099bf150c497bc7c87a1cce8f6fee98c2194

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: SARC_0.99.23.tar.gz Linux (Ubuntu 22.04.2 LTS): SARC_0.99.23.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 190c4ad3ba5e18a0924d347709dd5e8eff7bdc23

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 5ca782f360609a9eceee67c3a0477001848b4fb3

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): SARC_0.99.231.tar.gz macOS 12.6.5 Monterey: SARC_0.99.231.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: SARC_0.99.232.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7524cc45166508a240a57a065ddfb479938bcacc

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): SARC_0.99.233.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8e2ed53d67ef63c58485035d6ec9d29978c0a366

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: Linux (Ubuntu 22.04.2 LTS): SARC_0.99.234.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SARC to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

Krutik6 commented 1 year ago

Hi @DarioS, I have made some more updates on the package which I will list here.

Don't use require in the vignette for code users will run, but do use library. require() is usually only used if you are loading packages inside a function so that the function will continue to execute after warning, if a package doesn’t exist.

Sorry I could not find the function library. require(), so instead just used library() in the vignette. Hope this is sufficient.

RStudio automatically does spell-check and underlines potential mistakes. Please fix whichever underlining is relevant.

Thanks - I have done this now on the vignette, all R function scripts and test scripts.

x <- coverage(x = "../inst/extdata/test.bam") is error unless user's working directory is the package directory. Please do it more robustly, such as system.file("extdata", "test.bam", package="SARC")

You were right to comment on this - thanks for catching the potential bugs that would occur. All calls from extdata are now called using system.file().

The for loop at the end.

Yes it gave almost the same result, minus a few columns which were easily re-added. I used this instead of the forloop.

DarioS commented 1 year ago

The package has been substantially improved and will be good member of the Bioconductor ecosystem.

bioc-issue-bot commented 1 year ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

Krutik6 commented 1 year ago

Thanks for all the help @DarioS and @lshep :) Hopefully it will help bioinformaticians with CNV data.

lshep commented 1 year ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/Krutik6.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("SARC"). The package 'landing page' will be created at

https://bioconductor.org/packages/SARC

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

SARC #3071