Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

TAPseq #1382

Closed argschwind closed 4 years ago

argschwind commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @argschwind

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: TAPseq
Type: Package
Title: Targeted scRNA-seq primer design for TAP-seq
Version: 0.99.0
Authors@R: c(
  person("Andreas", "Gschwind", email = "andreas.gschwind@stanford.edu",
         role = c("aut", "cre"), comment = c(ORCID = "0000-0002-0769-6907")),
  person("Lars", "Velten", email = "lars.velten@crg.eu", role = "aut",
         comment = c(ORCID = "0000-0002-1233-5874")),
  person("Lars", "Steinmetz", role = "aut"))
Description: Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence
  templates for target gene panels and design gene-specific primers using Primer3. Potenial
  off-targets can be estimated with BLAST. Requires working installations of Primer3 and BLASTn.
URL: https://github.com/argschwind/TAPseq
BugRepots: https://github.com/argschwind/TAPseq/issues
biocViews: SingleCell, Sequencing, Technology, CRISPR, PooledScreens
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.0.2
VignetteBuilder: knitr
Depends: R (>= 4.0)
Imports:
    methods,
    GenomicAlignments,
    GenomicRanges,
    IRanges,
    BiocGenerics,
    S4Vectors (>= 0.20.1),
    GenomeInfoDb,
    BSgenome,
    GenomicFeatures,
    Biostrings,
    dplyr,
    tidyr,
    BiocParallel
Suggests: 
    testthat,
    BSgenome.Hsapiens.UCSC.hg38,
    knitr,
    rmarkdown,
    ggplot2,
    Seurat,
    glmnet,
    cowplot,
    Matrix,
    rtracklayer
SystemRequirements:
    Primer3 (>= 2.5.0),
    BLAST+ (>=2.6.0)
bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

argschwind commented 4 years ago

Hi @mtmorgan,

Thank you for helping to review our package. I looked at the build reports and it looks like TAPseq didn't find the required tools when building the vignettes.

These programs are listed under SystemRequirements in DESCRIPTION. Are Primer3 (>= 2.5.0) and BLAST+ (>=2.6.0) installed and in PATH on the systems the package was build?

Thanks, Andreas

mtmorgan commented 4 years ago

Generally, the build systems will not be modified to support these dependencies, and a package that has such dependencies is generally not going to be usable across platforms (because the dependencies are not available to, e.g., Mac or Windows users). This compromises the utility of the package to the Bioconductor community, and complicates end-user and developer support.

My recommendation is to remove the dependencies from your package, e.g., by finding R-based alternatives. A less satisfactory solution is to modify your examples / tests /vignette not to use the functionality provided by this software, but then perhaps your code is not really tested in a way that builds confidence in the correctness of the code for the end user. The third alternative is to recognize that these third party tools are essential to your package, and that as a consequence the package is not portable across platforms and cannot be tested effectively, and so not appropriate for Bioconductor.

argschwind commented 4 years ago

I understand that not having the dependencies on the build systems means that my code won't be tested systematically. But I would consider BLAST and Primer3 long-standing and reliable bioinformatics tools, which are available for all three platforms. I built and tested the package on Linux (CentOS Linux 7), Mac (10.15.3) and Windows 10 (v. 1803). But I understand that this is anecdotal.

One of my reasons to submit it to Bioconductor was that there is another primer design package (http://bioconductor.org/packages/release/bioc/html/openPrimeR.html) available for a different task, which also utilizes 3rd party software. To me it looks like they do not test or run examples that depend on these tools.

If that would be an option for my package, I could create a new version that does not run examples requiring these tools. I would adapt unit tests so that the functions interacting with the tools are still tested, but do not rely on them creating output.

mtmorgan commented 4 years ago

yes what you describe would be an appropriate solution.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

b32092e Change package so that it can be built without Pri...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

mtmorgan commented 4 years ago

Is your package ready for review?

argschwind commented 4 years ago

R CMD check on tokay2 still raises weird warnings regarding links to functions in the man pages that I have to figure out. Example: "file link 'GRanges' in package 'GenomicRanges' does not exist and so has been treated as a topic"

Else it should be ready for review. The warning about the missing software when loading the package is intentional.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

85c321c Fix links to other functions in documentation Fix...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

945a66d Make bone marrow scRNA-seq example dataset smaller...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

mtmorgan commented 4 years ago

I have only a few cursory comments; please revise and respond with a short comment when ready for final review.

DESCRIPION, NAMESPACE, vignettes

R

man

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

b48db18 Implement Bioconductor reviewer comments Improve ...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

argschwind commented 4 years ago

I implemented the suggested changes, except for the partitioning trick for the TsIO class.

A typical TsIOList should contain 100-250 elements and rarely exceed 500. TsIOLists typically contains one element per target gene and the number of target genes is subject to limitations of multiplex PCR.

I had a look at the GenomicRanges and IRanges packages, but could not find the code that does the partitioning. If you think that this should be done for TsIO objects, it would be helpful if you pointed me to some example code of how this is done.

mtmorgan commented 4 years ago

I guess the simple case is easily understood, rather than representing a list of 4 numeric elements as list(1, 2, numeric(), c(3, 4, 5)) you could represent this as a single numeric vector c(1, 2, 3, 4) and a partitioning that told R where to break the vector into 'virtual' groups, e.g., c(1, 2, 2, 5) to end the first group after element 1, the second and third groups after element 2 (i.e., no elements in the third group, and the fourth group after element 5. The machinery could be

.My = setClass("My", contains = "CompressedList", prototype = prototype(elementType = "numeric"))
my = .My(unlistData = c(1, 2, 3, 4, 5), partitioning = PartitioningByEnd(c(1, 2, 2, 5)))

You get

> my
My of length 4
> lengths(my)
[1] 1 1 0 3
> my[[1]]
[1] 1
> my[[2]]
[1] 2
> my[[3]]
numeric(0)
> my[[4]]
[1] 3 4 5

but there's only one data vector my@unlistData plus the instructions for partitioning my@partitioning. This will be space and time efficient when there are a lot of elements. Some operations, e.g.,unlist() and relist() are very inexpensive, so something like mendoapply(my, sqrt) can be very efficiently implemented as relist(sqrt(unlist(my)), my).

I guess your TsIO data structure is quite complicated, so probably would require quite a bit of re-factoring to fit this paradigm (any vector-like object, e.g., a GRanges, can be partitioned in the way outlined above, but I guess you would have perhaps two or three vector-like objects in your class...). If this aspect of performance isn't a problem, then I wouldn't pursue it.

Incidentally I noticed when I walked through the TsIO-class man page and printed the first object created

> obj <- TsIO(sequence_id = tx_id, target_sequence = tx_seq, beads_oligo = beads_oligo,
+             reverse_primer = reverse_primer, product_size_range = c(350, 500))
> obj
TsIO instance
  1004 bp sequence template
  Sequence ID: AKIP1
  Beads oligo: CTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
  Right primer: CTACACGACGCTCTTCCGATCT
  Specified product size range: 350-500 basepairs> 

I found myself in the middle of a line -- there should be a carriage return as the final output, so the prompt is on a new line.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

22a6034 Improve show method for TsIO objects Add number o...

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

argschwind commented 4 years ago

Thanks for the explanation! I don't believe that operations on TsIOList objects will be performance limiting. But if it turns out that users create huge TsIOList objects I guess I could always implement the proposed solution, which should only require internal changes to the package.

I improved the show method and added a newline at the end to ensure that the prompt is not in the middle of a line. Looks like RStudio automatically added a new line to the output...

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

50a16a4 Update LICENSE file

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

argschwind commented 4 years ago

I think the package is ready for final review.

bioc-issue-bot commented 4 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

argschwind commented 4 years ago

Thank you very much @mtmorgan that you took the time to review this package and all the helpful input.

mtmorgan commented 4 years ago

@argschwind the file examples/close2017_upstream_seqs.fa is greater than the largest file size we allow in git (it's 6.1 mb, the maximum size is 5mb). Can you either (1) create a smaller file or (2) use an existing file from AnnotationHub or (3) create an ExperimentHub data package for your data resources?

Let me know what your plan is and if I can be of any help.

denniscwylie commented 4 years ago

@mtmorgan I think you're referring to a file in my submitted package (sarks) not this one (TAPseq). The file in question (the whole examples directory, actually) is omitted from the R package itself in .Rbuildignore, but if it still causes a problem in the git repo anyway I could potentially remove it if necessary---just let me know!

Thanks, Dennis

mtmorgan commented 4 years ago

yes it is in sarks, thanks for the clarification. Yes, it needs to be made smaller or removed from the git version that is added to Bioconductor.

mtmorgan commented 4 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/argschwind.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("TAPseq"). The package 'landing page' will be created at

https://bioconductor.org/packages/TAPseq

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.