Closed aprice26 closed 5 years ago
Hi @aprice26
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: brainflowprobes
Type: Package
Title: Plots and annotation for choosing BrainFlow target probe sequence
Version: 0.99.0
Authors@R: c(person("Amanda", "Price", email = "amanda.joy.price@gmail.com",
role = c("aut", "cre"), comment = c(ORCID = "0000-0001-7352-3732")),
person("Leonardo", "Collado-Torres", role = c("ctb"),
email = "lcolladotor@gmail.com", comment = c(ORCID = "0000-0003-2140-308X")))
Description: Use these functions to characterize genomic regions for
BrainFlow target probe design.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends:
R (>= 3.6.0)
Imports:
Biostrings (>= 2.52.0),
BSgenome.Hsapiens.UCSC.hg19 (>= 1.4.0),
bumphunter (>= 1.26.0),
cowplot (>= 1.0.0),
derfinder (>= 1.18.1),
derfinderPlot (>= 1.18.1),
GenomicRanges (>= 1.36.0),
ggplot2 (>= 3.1.1),
RColorBrewer (>= 1.1),
utils,
grDevices
RoxygenNote: 6.1.1
Suggests:
BiocStyle,
knitcitations,
knitr,
rmarkdown,
sessioninfo,
testthat (>= 2.1.0)
VignetteBuilder: knitr
URL: https://github.com/LieberInstitute/brainflowprobes
BugReports: https://support.bioconductor.org/t/brainflowprobes/
biocViews: Coverage, Visualization, ExperimentalDesign, Transcriptomics,
FlowCytometry, GeneTarget
Add SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.
Dear Bioconductor package reviewer,
The brainflowprobes
package passes R CMD check and R CMD BiocCheck except for a warning and an error related to the size of a file (and hence the package) as shown here. That is, the package passes these checks for the most part, except for the 5 Mb size limit. The package functions rely on two objects that take about 40 and 200-500 seconds to re-make from scratch, hence why they are currently included in the data/ directory. One of them is 13 mb in disk, bringing the total installed package size to ~20 mb; thus triggering a warning and an error on BiocCheck.
We communicated this to Lori Shepherd who recommended we submit the package as-is and proceed with the review process.
Best, Amanda
(cc @lcolladotor)
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
ecd54f5 v0.99.2 -- testing the BioC SBP webhook that I jus...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @aprice26 ,
Thanks for the updates! I'll proceed with the initial review. Let's see if we can clear the error or other issues when we look inside the package.
Best, Qian
Hi @aprice26 ,
Please see below for the initial review of your package. Seek to address all or most of the issues and comment back here with any questions / updates, and when you are ready for a 2nd look.
Cheers, Qian
four_panels_example_cov
are all lists with only 1
element each (matrix), which is contrary to the documentation,
describing as matrices. Please explain if these are intended or not,
and also make the documentations consistent.> sapply(four_panels_example_cov, class)
Sep Deg Cell Sort
"list" "list" "list" "list"
> sapply(four_panels_example_cov, length)
Sep Deg Cell Sort
1 1 1 1
four_panels.R:
if(JUNCTIONS)
instead to avoid double check!regionCov
if JUNCTION
is TRUE. So only do this
under the first if
condition and no need for the else
statement..
. Refer here.plot_coverage.R
four_panels.R
. Consider writing
some internal utility functions (e.g, separate functions for
checking the input file format, and certain R value assignments),
and call the utility functions inside these exported
functions. These will make the code more robust and easy to
maintain.region_info.R
xx
or !xx
instead."Basics" section:
BiocManager
conditionally:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("LieberInstitute/brainflowprobes")
If you are asking yourself the question "How do I start using Bioconductor?" you might be interested in [this blog post](http://lcolladotor.github.io/2014/10/16/startBioC/#.VkOKbq6rRuU).
Received a valid push; starting a build. Commits are:
b38ce9c v0.99.3 -- address @Liubuntu's requests from https...
Received a valid push; starting a build. Commits are:
8b5d39f 0.99.4 -- fix the speed gains for plot_coverage() ...
Hi @Liubuntu,
Thank you for your review of brainflowprobes
! I took the liberty to address the suggestions you addressed to Amanda @aprice26. There are more unit tests, helper functions and other improvements.
Let us know if you have any questions!
Best, Leo
PS I also fixed the URL (it's all lowercase now) in several other R packages I maintain.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @lcolladotor ,
Thanks for the updates. The newly added documentations, utility functions and unit tests look great. Here is one more issue from the 2nd review. Please comment back for any question or updates!
Best, Qian
currently this function uses the working directory as the default
path to save the generated pdf file, which is not recommended!
Because the examples in help page and in vignette will generate
files in home directory whenever the R CMD build/check
is called,
e.g., in the Bioconductor building machine with daily build and check.
This fills the building machine space quickly and requires manual cleaning.
If the file path to save results in working directory is not strictly required,
it is recommended to add a new argument, e.g., outdir
, for the path and use
temporary directory as default, so that it cleans by itself after each build. Users
could also have the flexibility to specify their own path (also
update in parameter documentations). The current PDF
could stay
unchanged to only represent the pdf file name. You may need to
reconstruct the PDF inside the script early to have your current
code working:
PDF <- file.path(outdir, PDF)
Also update the documentation and examples accordingly.
Hi @aprice26 and @lcolladotor ,
I have just checked the data-raw/create_sysdata.R
file, and found that you have actually created a txdb
object. Would you consider this data to be useful to a broader Bioconductor user? If yes, you may have it prepared as an AnnotationHub package (@lshep could help for any question), and import this package when creating the other data sets currently inside data/
folder.
Qian
Any updates on this? @aprice26 @lcolladotor The issue should be easily fixed.
Received a valid push; starting a build. Commits are:
b2472db v0.99.5 -- add the OUTDIR argument to check_pdf() ...
Hi @Liubuntu,
brainflowprobes
version 0.99.5 now has the OUTDIR
argument which should resolve your comment on that subject https://github.com/Bioconductor/Contributions/issues/1191#issuecomment-530115005.
Regarding the Gencode version 31 lifted over from hg38 to hg19 TxDb
object we make at https://github.com/LieberInstitute/brainflowprobes/blob/master/data-raw/create_sysdata.R#L1-L31 (filtered to the canonical chromosomes), I see that there are no AnnotationHub
TxDb
objects for that annotation. Which might be a case for creating one. However, for brainflowprobes
we need the output of derfinder::makeGenomicState()
which will take several minutes to run. That is part of https://github.com/LieberInstitute/brainflowprobes/blob/master/data-raw/create_sysdata.R#L34-L68 which is the data actually provided in brainflowprobes
. So we would like to provide the gs
object in this package anyway.
Thus an AnnotationHub
package for brainflowprobes
is not really required (if you are ok with the 21.8 Mb installed size of the package). If not, we can create an AnnotationHub
package with the TxDb
, gs
and genes
objects.
> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2019-05-02
> query(ah, c('Gencode', 'v31', 'Homo sapiens'))
AnnotationHub with 0 records
# snapshotDate(): 2019-05-02
> packageVersion('AnnotationHub')
[1] ‘2.16.1’
Best, Leo
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
a0e584d v0.99.6 -- fix a link to the doc file for GRanges
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi @aprice26 and @lcolladotor ,
We had a discussion with the Bioconductor team. Since there are already some data available for gencode in other version, we think it should be useful to have your gencode data for v31 on the AnnotationHub.
Also for the "genes.rda" and "gs.rda", we would suggest to make them available on ExperimentHub, to avoid the building error from very large tarball size. You'll need to: 1) upload AnnotationHub data (prepare the script for generating the data). 2) upload the ExperimentHub data. 3) then create a software package "brainflowprobesData??", include scripts for generating the datasets for "gene" and "gs" from the Annotation resources, and define functions for retrieving the ExperimentHub data. 4) In this package "brainflowprobes", depend on the above package, and call the functions defined to load the ExperimentHub data.
@lshep , please feel free to modify the above steps if necessary. Questions about AH and EH could also be address to Lori.
Best, qian
Hi @lshep @Liubuntu,
Regarding the AnnotationHub
package, I looked at the current Gencode v23
files as shown below which lead me to GencodeGffImportPreparer
.
library('AnnotationHub')
ah <- AnnotationHub()
# snapshotDate(): 2019-05-02
q <- query(ah, c('Gencode', 'v23', 'Homo sapiens'))
mcols(q)
# DataFrame with 9 rows and 15 columns
unique(q$preparerclass)
# [1] "GencodeGffImportPreparer"
packageVersion('AnnotationHub')
# [1] ‘2.16.1’
From the Google search results for GencodeGffImportPreparer
I found https://rdrr.io/github/Bioconductor/AnnotationHubData/src/R/makeGencodeGFF.R which seems like it has everything it needs. I tweaked the code a little bit at https://gist.github.com/lcolladotor/bb8cacb7237a13c092911cf8f2ac7eac/revisions (I made it into a gist so you could see the diffs). I added AnnotationHubData:::
to some calls just to test locally, but you could remove those. Basically, I modified .gencodeSourceUrls()
such that it would detect the correct genome version for the files that have been lifted over. Then I also changed makeGencodeGFFsToAHMs()
such that it would take parameters and pass them to .gencodeSourceUrls()
. So this is how it looks with release = '23
:
> .gencodeSourceUrls(species = 'Human', release = '23',
+ filetype = 'gff', justRunUnitTest = FALSE)
getting file info: gencode.v23.2wayconspseudos.gff3.gz
getting file info: gencode.v23.annotation.gff3.gz
getting file info: gencode.v23.basic.annotation.gff3.gz
getting file info: gencode.v23.chr_patch_hapl_scaff.annotation.gff3.gz
getting file info: gencode.v23.chr_patch_hapl_scaff.basic.annotation.gff3.gz
getting file info: gencode.v23.long_noncoding_RNAs.gff3.gz
getting file info: gencode.v23.polyAs.gff3.gz
getting file info: gencode.v23.primary_assembly.annotation.gff3.gz
getting file info: gencode.v23.tRNAs.gff3.gz
getting file info: gencode.v23lift37.annotation.gff3.gz
getting file info: gencode.v23lift37.basic.annotation.gff3.gz
getting file info: gencode.v23lift37.long_noncoding_RNAs.gff3.gz
getting file info: gencode.v23lift37.unmapped.gff3.gz
fileurl date size
1 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.2wayconspseudos.gff3.gz <NA> NA
2 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.annotation.gff3.gz <NA> NA
3 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.basic.annotation.gff3.gz <NA> NA
4 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.chr_patch_hapl_scaff.annotation.gff3.gz <NA> NA
5 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.chr_patch_hapl_scaff.basic.annotation.gff3.gz <NA> NA
6 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.long_noncoding_RNAs.gff3.gz <NA> NA
7 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.polyAs.gff3.gz <NA> NA
8 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.primary_assembly.annotation.gff3.gz <NA> NA
9 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/gencode.v23.tRNAs.gff3.gz <NA> NA
10 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.annotation.gff3.gz <NA> NA
11 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.basic.annotation.gff3.gz <NA> NA
12 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.long_noncoding_RNAs.gff3.gz <NA> NA
13 ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.unmapped.gff3.gz <NA> NA
rdatapath
1 Gencode_human/release_23/gencode.v23.2wayconspseudos.gff3.gz
2 Gencode_human/release_23/gencode.v23.annotation.gff3.gz
3 Gencode_human/release_23/gencode.v23.basic.annotation.gff3.gz
4 Gencode_human/release_23/gencode.v23.chr_patch_hapl_scaff.annotation.gff3.gz
5 Gencode_human/release_23/gencode.v23.chr_patch_hapl_scaff.basic.annotation.gff3.gz
6 Gencode_human/release_23/gencode.v23.long_noncoding_RNAs.gff3.gz
7 Gencode_human/release_23/gencode.v23.polyAs.gff3.gz
8 Gencode_human/release_23/gencode.v23.primary_assembly.annotation.gff3.gz
9 Gencode_human/release_23/gencode.v23.tRNAs.gff3.gz
10 Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.annotation.gff3.gz
11 Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.basic.annotation.gff3.gz
12 Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.long_noncoding_RNAs.gff3.gz
13 Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.unmapped.gff3.gz
description
1 pseudogenes predicted by the Yale & UCSC pipelines, but not by Havana on reference chromosomes
2 Gene annotations on reference chromosomes from Gencode
3 Gene annotations on reference chromosomes from Gencode
4 Gene annotation on reference-chromosomes/patches/scaffolds/haplotypes from Gencode
5 Gene annotations on reference chromosomes from Gencode
6 sub-set of the main annotation files on the reference chromosomes. They contain only the lncRNA genes. Long non-coding RNA genes are considered the genes with any of those biotypes: 'processed_transcript', 'lincRNA', '3prime_overlapping_ncrna', 'antisense', 'non_coding', 'sense_intronic' , 'sense_overlapping' , 'TEC' , 'known_ncrna'.
7 files contain polyA signals, polyA sites and pseudo polyAs manually annotated by HAVANA from only the refrence chromosome
8 Gene annotations on reference chromosomes from Gencode
9 tRNA structures predicted by tRNA-Scan on reference chromosomes
10 Gene annotations on reference chromosomes from Gencode
11 Gene annotations on reference chromosomes from Gencode
12 sub-set of the main annotation files on the reference chromosomes. They contain only the lncRNA genes. Long non-coding RNA genes are considered the genes with any of those biotypes: 'processed_transcript', 'lincRNA', '3prime_overlapping_ncrna', 'antisense', 'non_coding', 'sense_intronic' , 'sense_overlapping' , 'TEC' , 'known_ncrna'.
13
tags species taxid genome
1 gencode,v23,2wayconspseudos,gff3 Homo sapiens 9606 GRCh38
2 gencode,v23,annotation,gff3 Homo sapiens 9606 GRCh38
3 gencode,v23,basic,annotation,gff3 Homo sapiens 9606 GRCh38
4 gencode,v23,chr_patch_hapl_scaff,annotation,gff3 Homo sapiens 9606 GRCh38
5 gencode,v23,chr_patch_hapl_scaff,basic,annotation,gff3 Homo sapiens 9606 GRCh38
6 gencode,v23,long_noncoding_RNAs,gff3 Homo sapiens 9606 GRCh38
7 gencode,v23,polyAs,gff3 Homo sapiens 9606 GRCh38
8 gencode,v23,primary_assembly,annotation,gff3 Homo sapiens 9606 GRCh38
9 gencode,v23,tRNAs,gff3 Homo sapiens 9606 GRCh38
10 gencode,v23lift37,annotation,gff3 Homo sapiens 9606 GRCh37
11 gencode,v23lift37,basic,annotation,gff3 Homo sapiens 9606 GRCh37
12 gencode,v23lift37,long_noncoding_RNAs,gff3 Homo sapiens 9606 GRCh37
13 gencode,v23lift37,unmapped,gff3 Homo sapiens 9606 GRCh37
I think that it makes more sense for the Bioconductor maintainer
account to run this for Gencode releases that are missing (like v31
) instead of keeping it hardcoded to just v23
. Then I can use these new AnnotationHub
entries for the ExperimentHub
package you described. That could be done with something like:
## Only for human here
makeGencodeGFFsToAHMs_multiple_human <- function() {
## Here just two for testing:
releases <- c('23', '31')
## For all including v23
# releases <- as.character(23:31)
hubs <- lapply(releases, function(rel) makeGencodeGFFsToAHMs(release = rel))
unlist(hubs)
}
## Manual check (showing the end of `unlist(hubs)` here):
# $...
# class: AnnotationHubMetadata
# AnnotationHubRoot: NA
# BiocVersion: 3.9
# Coordinate_1_based: TRUE
# DataProvider: Gencode
# DerivedMd5: NA
# Description: sub-set of the main annotation files on the reference chromosomes. They contain only the lncRNA genes. Long non-coding RNA genes are considered the genes with any of those biotypes:
# 'processed_transcript', 'lincRNA', '3prime_overlapping_ncrna', 'antisense', 'non_coding', 'sense_intronic' , 'sense_overlapping' , 'TEC' , 'known_ncrna'.
# DispatchClass: GFF3File
# Error: NA_character
# Genome: GRCh37
# HubRoot: NA
# Location_Prefix: ftp://ftp.ebi.ac.uk/pub/databases/gencode/
# Maintainer: Bioconductor Maintainer <maintainer@bioconductor.org>
# Notes: NA
# PreparerClass: NA
# RDataClass: GRanges
# RDataDateAdded: 2019-10-02
# RDataPath: Gencode_human/release_31/GRCh37_mapping/gencode.v31lift37.long_noncoding_RNAs.gff3.gz
# Recipe: NA
# SourceLastModifiedDate: NA
# SourceMd5: NA
# SourceSize: NA
# SourceType: GFF
# SourceUrl: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/GRCh37_mapping/gencode.v31lift37.long_noncoding_RNAs.gff3.gz
# SourceVersion: NA
# Species: Homo sapiens
# Tags: gencode v31lift37 long_noncoding_RNAs gff3
# TaxonomyId: 9606
# Title: gencode.v31lift37.long_noncoding_RNAs.gff3.gz
#
# [[26]]
# class: AnnotationHubMetadata
# AnnotationHubRoot: NA
# BiocVersion: 3.9
# Coordinate_1_based: TRUE
# DataProvider: Gencode
# DerivedMd5: NA
# Description:
# DispatchClass: GFF3File
# Error: NA_character
# Genome: GRCh37
# HubRoot: NA
# Location_Prefix: ftp://ftp.ebi.ac.uk/pub/databases/gencode/
# Maintainer: Bioconductor Maintainer <maintainer@bioconductor.org>
# Notes: NA
# PreparerClass: NA
# RDataClass: GRanges
# RDataDateAdded: 2019-10-02
# RDataPath: Gencode_human/release_31/GRCh37_mapping/gencode.v31lift37.unmapped.gff3.gz
# Recipe: NA
# SourceLastModifiedDate: NA
# SourceMd5: NA
# SourceSize: NA
# SourceType: GFF
# SourceUrl: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/GRCh37_mapping/gencode.v31lift37.unmapped.gff3.gz
# SourceVersion: NA
# Species: Homo sapiens
# Tags: gencode v31lift37 unmapped gff3
# TaxonomyId: 9606
# Title: gencode.v31lift37.unmapped.gff3.gz
makeAnnotationHubResource("GencodeGffImportPreparer",
makeGencodeGFFsToAHMs_multiple_human)
If there's a GitHub repository for AnnotationHubData
I can submit my changes as a pull request.
How does this sound?
Best, Leo
I'll follow up once @lshep is finished with the PR, and you have the ExperimentHub data up. @lcolladotor
AdditionalPackage: https://github.com/LieberInstitute/GenomicState
The above likely won't work since https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#submitting-related-packages specifies that the author of the issue has to make that comment. Anyway, just checking :P
AdditionalPackage: https://github.com/LieberInstitute/GenomicState
Hi @aprice26,
Starting build on additional package https://github.com/LieberInstitute/GenomicState.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.
The DESCRIPTION file of this additional package is:
Package: GenomicState
Title: Build and access GenomicState objects for use with derfinder tools from
sources like Gencode
Version: 0.99.0
Date: 2019-10-4
Authors@R:
person("Leonardo", "Collado-Torres", role = c("aut", "cre"),
email = "lcolladotor@gmail.com", comment = c(ORCID = "0000-0003-2140-308X"))
Description: This package contains functions for building GenomicState objects
from different annotation sources such as Gencode. It also provides access
to these files at JHPCE.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Imports:
GenomicFeatures,
GenomeInfoDb,
rtracklayer,
bumphunter,
derfinder,
AnnotationDbi,
IRanges,
org.Hs.eg.db,
utils,
AnnotationHubData,
AnnotationHub
Roxygen: list(markdown = TRUE)
RoxygenNote: 6.1.1
Suggests:
knitr,
rmarkdown,
BiocStyle,
knitcitations,
sessioninfo,
testthat (>= 2.1.0),
glue,
derfinderPlot
VignetteBuilder: knitr
URL: https://github.com/LieberInstitute/GenomicState
BugReports: https://github.com/LieberInstitute/GenomicState/issues
biocViews: Coverage, Transcriptomics, Homo_sapiens, TxDb, AnnotationHub
Remotes: lcolladotor/bumphunter@fix_namespace_genes
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
60b636e v0.99.1 -- fixed some minor bugs
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
23b0fa1 v0.99.3 -- bump version for the bioc-issue-bot
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Hi @Liubuntu (cc @lshep),
I added the GenomicState
package (docs at http://research.libd.org/GenomicState/). Instead of an ExperimentHub
package, I made it as an AnnotationHub
package since it contains really annotation files from Gencode in different formats: TxDb
sqlite files, bumphunter::annotateTranscripts(by = 'gene)
and derfinder::makeGenomicState()
output using the TxDb
objects. These are made by GenomicState::gencode_txdb()
, GenomicState::annotated_genes()
and GenomicState::gencode_genomic_state()
respectively. I chose those function names since you could imagine adding more annotation sources later on. The only manipulation I do to the GTF files is subset them to the canonical chromosomes (chrs 1 to 22, X, Y and M). But that should be reasonable I believe for an AnnotationHub
package.
The TxDb
sqlite files are made using the GTF files from Gencode right now. That is, using https://github.com/LieberInstitute/GenomicState/blob/master/R/gencode_txdb.R which is based off the current code at https://github.com/LieberInstitute/brainflowprobes/blob/master/data-raw/create_sysdata.R. Though once https://github.com/Bioconductor/AnnotationHubData/pull/2 is live, GenomicState::gencode_txdb()
could use the Gencode GFF files from AnnotationHub
. Regardless of the status of https://github.com/Bioconductor/AnnotationHubData/pull/2 I think that having these files pre-computed would be useful since the three steps (TxDb building, annotated genes and genomic state) take a bit to run (for example this one took about 10 minutes https://github.com/LieberInstitute/GenomicState/blob/master/data-raw/logs/build_gencode_human_hg38.32.txt#L114).
The idea is that once the data from GenomicState
is available through AnnotationHub
, I could then change brainflowprobes
to use that data through GenomicState::GenomicStateHub()
. Currently, I made objects for human genomes hg38
and hg19
for Gencode versions 23 till 32 (latest one). While brainflowprobes
only needed the hg19
version 31 files (as was made), we could make brainflowprobes
more flexible to use any of the Gencode versions on hg19
. Additionally, another member in our group needed these files for hg38
Gencode version 25 and 29 (hence why I made GenomicState::local_metadata()
) and would benefit from having the data available through AnnotationHub
. This could also help with recountWorkflow
where I currently have users make one of these GenomicState objects https://github.com/LieberInstitute/recountWorkflow/blob/master/vignettes/recount-workflow.Rmd#L874 despite the computing time resources it requires. That is, benefit all derfinderPlot
users (or whoever wants to build upon the GenomicState objects).
GenomicState
depends on AnnotationHub
instead of just importing it so users will have the rest of AnnotationHub
functions on their search path as GenomicState::GenomicStateHub()
returns the result of AnnotationHub::query()
.
Once the GenomicState
data is live through AnnotationHub
I can then finish the docs on the package and GenomicState::GenomicStateHub()
.
Let me know if you have any questions.
Best, Leo
Hi @lshep ,
Please let me know when the data @lcolladotor has prepared are available on AH. I couldn't find anything when adding "hg19", or "hg38" tag yet.
> query(ah, pattern=c("gencode", "v31"))
AnnotationHub with 13 records
# snapshotDate(): 2019-10-08
# $dataprovider: Gencode
# $species: Homo sapiens
# $rdataclass: GRanges
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH75118"]]'
title
AH75118 | gencode.v31.2wayconspseudos.gff3.gz
AH75119 | gencode.v31.annotation.gff3.gz
AH75120 | gencode.v31.basic.annotation.gff3.gz
AH75121 | gencode.v31.chr_patch_hapl_scaff.annotation.gff3.gz
AH75122 | gencode.v31.chr_patch_hapl_scaff.basic.annotation.gff3.gz
... ...
AH75126 | gencode.v31.tRNAs.gff3.gz
AH75127 | gencode.v31lift37.annotation.gff3.gz
AH75128 | gencode.v31lift37.basic.annotation.gff3.gz
AH75129 | gencode.v31lift37.long_noncoding_RNAs.gff3.gz
AH75130 | gencode.v31lift37.unmapped.gff3.gz
> query(ah, pattern=c("gencode", "v31", "hg19"))
AnnotationHub with 0 records
# snapshotDate(): 2019-10-08
> query(ah, pattern=c("gencode", "v31", "hg38"))
AnnotationHub with 0 records
# snapshotDate(): 2019-10-08
Hi @lcolladotor ,
There are some partial review for the vignette of GenomicState
:
Installation: Should write as if it was already included in Bioconductor. So include something like:
1. Download the package from Bioconductor.
{r getPackage, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("pkgName")
Or install the development version of the package from Github.
{r, eval = FALSE}
BiocManager::install(“githubUserName/pkgname”)
2. Load the package into R session.
{r Load, message=FALSE}
library(pkgName)
Citation: The Zotero has just added support for Bioconductor packages. So Zotero users could open the package landing page and click on the zotero icon (as browser extensions) to cite it as a software package, with title, author, version, bioc version info. This will be particularly useful if there was no journal publications yet. e.g.,
[1]L. Collado-Torres, A. E. Jaffe, and J. T. Leek, derfinderPlot: Plotting functions for derfinder. Bioconductor version: Release (3.9), 2019.
I dont know what data you are referring or expecting in the hubs. I added the v31 as requested.
Hi @Liubuntu,
Thanks for the partial review of GenomicState
!
GenomicState
shortly.inst/CITATION
file for GenomicState
? Or are you referring to the citations in the vignette which I made with citation('pkgname')
? Or maybe changing the inst/CITATION
file for derfinderPlot
?Best, Leo
Hi @lshep,
Sorry for the confusion.
The PR to AnnotationHubData
that you merged was about adding GFFs for Gencode version 31 to AnnotationHub
(and in general any version from 24 to 32 since 23 was there already) . brainflowprobes
though requires objects processed from the annotation that take about 10 minutes to build. Instead of providing that data in the brainflowprobes
package, @Liubuntu suggested providing it through Experiment/Annotation hub. That's where the new package submission GenomicState
comes in with data for several Gencode versions that I wish to submit to AnnotationHub
(Gencode v23 to v32 for hg19 and hg38). That would be the data described by https://github.com/LieberInstitute/GenomicState/blob/master/inst/extdata/metadata_gencode_human.csv which are: TxDb
sqlite files + bumphunter::annotateTranscripts(by = 'gene', txdb = TxDb)
+ derfinder::makeGenomicState(txdb = TxDb)
that I described in my detail in my previous comment.
You might say that the TxDb
files and the AnnotationHubData
GFF files are redundant since you can build the TxDb files from the GFF ones. Though that doesn't work exactly right out of the box as shown in https://github.com/LieberInstitute/GenomicState/blob/master/R/gencode_txdb.R and takes a bit of time to compute.
I do think that https://github.com/LieberInstitute/GenomicState/blob/master/R/gencode_txdb.R#L45-L49 could use the Gencode GFF files from AnnotationHub
(or the GRanges built on the fly according to your latest comment on the PR https://github.com/Bioconductor/AnnotationHubData/pull/2#issuecomment-539988403) instead of the Gencode GTF files that it uses currently. I could make this change if you upload to AnnotionHub
the GFFs (or GRanges) for Gencode versions 23 to 32 (the ones missing which I think are 24 to 30 and 32).
The demand for more Gencode versions comes from outside of brainflowprobes
as we have local users interested in different Gencode versions, which prompted me to think that Bioconductor users in general might want the other versions too.
Let me know if I can clarify anything else or if you want to have a skype chat about this.
Best, Leo
Ok - so we will skip using/generating the GFF in the AnnotationHub and continue forth adding your data in the data package TxDb sqlite and Rda files.
Please upload your data to S3 to continue. If you haven't been given credentials recently please email me to get access.
Hi @lcolladotor ,
For the citation comment, there was nothing wrong with your current package.
I was just mentioning that Bioconductor is supported for direct citation using Zotero reference management software. See here for more details: https://support.bioconductor.org/p/124760/ Basically the returned bibliography includes bioc version (e.g., 3.9 / 3.10), year, DOI, etc. This might be useful to know for package maintainers when they publicly promote their package citation or for general users.
Best, Qian
@lcolladotor ,
Please work with Lori @lshep in uploading the additional file, so that we can move forward with these data packages. Since the release schedule indicates that the last day to accept new packages into Bioc3.10 would be next Wednesday, so that we can have this data available in the new release. Thanks!
Qian
Wednesday October 23
Deadline to add new packages to the BiocC 3.10 manifest. Package submitted to tracker must have
completed the review processes and been accepted to be added to the manifest
Yup I will. I’m at a conference right now and will be back on Monday. Best, Leo
On Fri, Oct 18, 2019 at 2:44 PM Qian Liu notifications@github.com wrote:
@lcolladotor https://github.com/lcolladotor ,
Please work with Lori @lshep https://github.com/lshep in uploading the additional file, so that we can move forward with these data packages. Since the release schedule http://bioconductor.org/developers/release-schedule/ indicates that the last day to accept new packages into Bioc3.10 would be next Wednesday, so that we can have this data available in the new release. Thanks!
Qian
Wednesday October 23
Deadline to add new packages to the BiocC 3.10 manifest. Package submitted to tracker must have completed the review processes and been accepted to be added to the manifest
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1191?email_source=notifications&email_token=AAROUVM3K63WSA5JUPNFGRLQPIGYRA5CNFSM4IG5NHCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBVWFBQ#issuecomment-543908486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAROUVLXYL3N7U67PHYJPS3QPIGYRANCNFSM4IG5NHCA .
@lcolladotor Data has been added to AnnotationHub
> ah = AnnotationHub()
|======================================================================| 100%
snapshotDate(): 2019-10-22
query(ah, "Genomic> query(ah, "GenomicState")
AnnotationHub with 60 records
# snapshotDate(): 2019-10-22
# $dataprovider: GENCODE
# $species: Homo sapiens
# $rdataclass: GRanges, TxDb, list
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH75134"]]'
title
AH75134 | TxDb for Gencode v23 on hg19 coordinates
AH75135 | Annotated genes for Gencode v23 on hg19 coordinates
AH75136 | GenomicState for Gencode v23 on hg19 coordinates
AH75137 | TxDb for Gencode v23 on hg38 coordinates
AH75138 | Annotated genes for Gencode v23 on hg38 coordinates
... ...
AH75189 | Annotated genes for Gencode v32 on hg19 coordinates
AH75190 | GenomicState for Gencode v32 on hg19 coordinates
AH75191 | TxDb for Gencode v32 on hg38 coordinates
AH75192 | Annotated genes for Gencode v32 on hg38 coordinates
AH75193 | GenomicState for Gencode v32 on hg38 coordinates
> TxDb = query(ah, "GenomicState")[[1]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
> TxDb
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_23/GRCh37_mapping/gencode.v23lift37.annotation.gtf.gz
# Organism: Homo sapiens
# Taxonomy ID: 9606
# miRBase build ID: NA
# Genome: hg19
# transcript_nrow: 198269
# exon_nrow: 678347
# cds_nrow: 270269
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2019-10-07 10:00:19 -0400 (Mon, 07 Oct 2019)
# GenomicFeatures version at creation time: 1.36.4
# RSQLite version at creation time: 2.1.2
# DBSCHEMAVERSION: 1.2
Received a valid push; starting a build. Commits are:
5176c80 v0.99.5 -- update docs now that data is live at An...
Received a valid push; starting a build. Commits are:
375fcf3 v0.99.6 -- update the docs on my laptop, since my ...
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.
(next line added by mtmorgan during package acceptance)
AdditionalPackage: https://github.com/LieberInstitute/GenomicState