Closed argschwind closed 4 years ago
Hi @argschwind
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: TAPseq
Type: Package
Title: Targeted scRNA-seq primer design for TAP-seq
Version: 0.99.0
Authors@R: c(
person("Andreas", "Gschwind", email = "andreas.gschwind@stanford.edu",
role = c("aut", "cre"), comment = c(ORCID = "0000-0002-0769-6907")),
person("Lars", "Velten", email = "lars.velten@crg.eu", role = "aut",
comment = c(ORCID = "0000-0002-1233-5874")),
person("Lars", "Steinmetz", role = "aut"))
Description: Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence
templates for target gene panels and design gene-specific primers using Primer3. Potenial
off-targets can be estimated with BLAST. Requires working installations of Primer3 and BLASTn.
URL: https://github.com/argschwind/TAPseq
BugRepots: https://github.com/argschwind/TAPseq/issues
biocViews: SingleCell, Sequencing, Technology, CRISPR, PooledScreens
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.0.2
VignetteBuilder: knitr
Depends: R (>= 4.0)
Imports:
methods,
GenomicAlignments,
GenomicRanges,
IRanges,
BiocGenerics,
S4Vectors (>= 0.20.1),
GenomeInfoDb,
BSgenome,
GenomicFeatures,
Biostrings,
dplyr,
tidyr,
BiocParallel
Suggests:
testthat,
BSgenome.Hsapiens.UCSC.hg38,
knitr,
rmarkdown,
ggplot2,
Seurat,
glmnet,
cowplot,
Matrix,
rtracklayer
SystemRequirements:
Primer3 (>= 2.5.0),
BLAST+ (>=2.6.0)
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @mtmorgan,
Thank you for helping to review our package. I looked at the build reports and it looks like TAPseq didn't find the required tools when building the vignettes.
These programs are listed under SystemRequirements in DESCRIPTION. Are Primer3 (>= 2.5.0) and BLAST+ (>=2.6.0) installed and in PATH on the systems the package was build?
Thanks, Andreas
Generally, the build systems will not be modified to support these dependencies, and a package that has such dependencies is generally not going to be usable across platforms (because the dependencies are not available to, e.g., Mac or Windows users). This compromises the utility of the package to the Bioconductor community, and complicates end-user and developer support.
My recommendation is to remove the dependencies from your package, e.g., by finding R-based alternatives. A less satisfactory solution is to modify your examples / tests /vignette not to use the functionality provided by this software, but then perhaps your code is not really tested in a way that builds confidence in the correctness of the code for the end user. The third alternative is to recognize that these third party tools are essential to your package, and that as a consequence the package is not portable across platforms and cannot be tested effectively, and so not appropriate for Bioconductor.
I understand that not having the dependencies on the build systems means that my code won't be tested systematically. But I would consider BLAST and Primer3 long-standing and reliable bioinformatics tools, which are available for all three platforms. I built and tested the package on Linux (CentOS Linux 7), Mac (10.15.3) and Windows 10 (v. 1803). But I understand that this is anecdotal.
One of my reasons to submit it to Bioconductor was that there is another primer design package (http://bioconductor.org/packages/release/bioc/html/openPrimeR.html) available for a different task, which also utilizes 3rd party software. To me it looks like they do not test or run examples that depend on these tools.
If that would be an option for my package, I could create a new version that does not run examples requiring these tools. I would adapt unit tests so that the functions interacting with the tools are still tested, but do not rely on them creating output.
yes what you describe would be an appropriate solution.
Received a valid push; starting a build. Commits are:
b32092e Change package so that it can be built without Pri...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Is your package ready for review?
R CMD check on tokay2 still raises weird warnings regarding links to functions in the man pages that I have to figure out. Example: "file link 'GRanges' in package 'GenomicRanges' does not exist and so has been treated as a topic"
Else it should be ready for review. The warning about the missing software when loading the package is intentional.
Received a valid push; starting a build. Commits are:
85c321c Fix links to other functions in documentation Fix...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
945a66d Make bone marrow scRNA-seq example dataset smaller...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I have only a few cursory comments; please revise and respond with a short comment when ready for final review.
DESCRIPION, NAMESPACE, vignettes
R
code seems to be well-organized and clearly written
TsIO_classes.R:148 A more typical paradigm for a validity method reports all invalid components at once, along the lines of
txt <- character()
if (length(seqname) > 1)
txt <- c(txt, "multiple seqnames in target_annot")
if (length(strand) > 1)
txt <- c(txt, "conflicting strand information in target_annot")
...
if (length(txt)) txt else TRUE
TsIO_classes.R:237 avoid use of internal functions; I think
setClass("TsIOList", contains = "SimpleList", prototype = "TsIO")
TsIOList <- function(...) {
objects <- list(...)
new("TsIOList, objects, elementType = "TsIO")
}
Will there be many TsIO
elements in a typical TsIOList
? If so it would be worth-while to revise TsIO
so that it can have 'length' 0, 1, ..., using the 'trick' of GRangesList
etc where a single object (GRanges) is partitioned into groups by a second partitioning object -- this creates a total of three object (GRangesList, GRanges, partitioning) rathern tan n + 1 objects, and will be economical in space and time. Using mendoapply()
, for instance, is much less efficient on large objects for transformations that can be performed by operations such as gr <- unlist(grl); ...; relist(gr, grl)
.
TsIO_classes.R:398 replacement methods are usually written
setReplaceMethod("sequence_id", "TsIO", function(x, value) {
...
})
man
Received a valid push; starting a build. Commits are:
b48db18 Implement Bioconductor reviewer comments Improve ...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I implemented the suggested changes, except for the partitioning trick for the TsIO
class.
A typical TsIOList
should contain 100-250 elements and rarely exceed 500. TsIOLists
typically contains one element per target gene and the number of target genes is subject to limitations of multiplex PCR.
I had a look at the GenomicRanges
and IRanges
packages, but could not find the code that does the partitioning. If you think that this should be done for TsIO
objects, it would be helpful if you pointed me to some example code of how this is done.
I guess the simple case is easily understood, rather than representing a list of 4 numeric elements as list(1, 2, numeric(), c(3, 4, 5))
you could represent this as a single numeric vector c(1, 2, 3, 4)
and a partitioning that told R where to break the vector into 'virtual' groups, e.g., c(1, 2, 2, 5)
to end the first group after element 1, the second and third groups after element 2 (i.e., no elements in the third group, and the fourth group after element 5. The machinery could be
.My = setClass("My", contains = "CompressedList", prototype = prototype(elementType = "numeric"))
my = .My(unlistData = c(1, 2, 3, 4, 5), partitioning = PartitioningByEnd(c(1, 2, 2, 5)))
You get
> my
My of length 4
> lengths(my)
[1] 1 1 0 3
> my[[1]]
[1] 1
> my[[2]]
[1] 2
> my[[3]]
numeric(0)
> my[[4]]
[1] 3 4 5
but there's only one data vector my@unlistData
plus the instructions for partitioning my@partitioning
. This will be space and time efficient when there are a lot of elements. Some operations, e.g.,unlist()
and relist()
are very inexpensive, so something like mendoapply(my, sqrt)
can be very efficiently implemented as relist(sqrt(unlist(my)), my)
.
I guess your TsIO data structure is quite complicated, so probably would require quite a bit of re-factoring to fit this paradigm (any vector-like object, e.g., a GRanges, can be partitioned in the way outlined above, but I guess you would have perhaps two or three vector-like objects in your class...). If this aspect of performance isn't a problem, then I wouldn't pursue it.
Incidentally I noticed when I walked through the TsIO-class man page and printed the first object created
> obj <- TsIO(sequence_id = tx_id, target_sequence = tx_seq, beads_oligo = beads_oligo,
+ reverse_primer = reverse_primer, product_size_range = c(350, 500))
> obj
TsIO instance
1004 bp sequence template
Sequence ID: AKIP1
Beads oligo: CTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Right primer: CTACACGACGCTCTTCCGATCT
Specified product size range: 350-500 basepairs>
I found myself in the middle of a line -- there should be a carriage return as the final output, so the prompt is on a new line.
Received a valid push; starting a build. Commits are:
22a6034 Improve show method for TsIO objects Add number o...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Thanks for the explanation! I don't believe that operations on TsIOList
objects will be performance limiting. But if it turns out that users create huge TsIOList
objects I guess I could always implement the proposed solution, which should only require internal changes to the package.
I improved the show method and added a newline at the end to ensure that the prompt is not in the middle of a line. Looks like RStudio automatically added a new line to the output...
Received a valid push; starting a build. Commits are:
50a16a4 Update LICENSE file
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
I think the package is ready for final review.
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
Thank you very much @mtmorgan that you took the time to review this package and all the helpful input.
@argschwind the file examples/close2017_upstream_seqs.fa is greater than the largest file size we allow in git (it's 6.1 mb, the maximum size is 5mb). Can you either (1) create a smaller file or (2) use an existing file from AnnotationHub or (3) create an ExperimentHub data package for your data resources?
Let me know what your plan is and if I can be of any help.
@mtmorgan I think you're referring to a file in my submitted package (sarks) not this one (TAPseq). The file in question (the whole examples directory, actually) is omitted from the R package itself in .Rbuildignore, but if it still causes a problem in the git repo anyway I could potentially remove it if necessary---just let me know!
Thanks, Dennis
yes it is in sarks, thanks for the clarification. Yes, it needs to be made smaller or removed from the git version that is added to Bioconductor.
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/argschwind.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("TAPseq")
. The package 'landing page' will be created at
https://bioconductor.org/packages/TAPseq
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.