Closed pmoulos closed 3 years ago
Hi @pmoulos
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: sitadela
Type: Package
Title: An R package for the easy provision of simple but
complete tab-delimited genomic annotation from a
variety of sources and organisms
Authors@R: c(person(given="Panagiotis", family="Moulos",
email="moulos@fleming.gr", role=c("aut", "cre")))
Depends:
R (>= 4.1.0)
Imports:
Biobase,
BiocGenerics,
biomaRt,
Biostrings,
GenomeInfoDb,
GenomicFeatures,
GenomicRanges,
httr,
IRanges,
methods,
parallel,
Rsamtools,
RSQLite,
rtracklayer,
S4Vectors,
stringr,
utils
Suggests:
BiocManager,
BSgenome,
knitr,
RMySQL,
RUnit
Description: Provides an interface to build a unified database of
genomic annotations and their coordinates (gene, transcript
and exon levels). It is aimed to be used when simple
tab-delimited annotations (or simple GRanges objects) are
required instead of the more complex annotation Bioconductor
packages. Also useful when combinatorial annotation elements
are reuired, such as RefSeq coordinates with Ensembl biotypes.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: false
URL: http://www.fleming.gr
biocViews: Software, WorkflowStep, RNASeq, Transcription, Sequencing,
Transcriptomics, BiomedicalInformatics, FunctionalGenomics,
SystemsBiology, AlternativeSplicing, DataImport, ChIPSeq
VignetteBuilder: knitr
BugReports: https://github.com/pmoulos/sitadela/issues
Version: 0.99.0
Date: 2021-02-19
Collate:
'argcheck.R'
'external.R'
'fromtxdb.R'
'query-ensembl.R'
'query-ncbi.R'
'query-refseq.R'
'query-ucsc.R'
'query-utils.R'
'reduceops.R'
'sitadela.R'
'testfuns.R'
'zzz.R'
Dear @pmoulos ,
You (or someone) has already posted that repository to our tracker.
See https://github.com/Bioconductor/Contributions/issues/1900
You cannot post the same repository more than once.
If you would like this repository to be linked to issue number: 1900, Please contact a Bioconductor Core Member. I am closing this issue.
Please ignore the previous comment concerning the issue being previously posted. I was running updates and inadvertently caused this message
Hi @pmoulos
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: sitadela
Type: Package
Title: An R package for the easy provision of simple but
complete tab-delimited genomic annotation from a
variety of sources and organisms
Authors@R: c(person(given="Panagiotis", family="Moulos",
email="moulos@fleming.gr", role=c("aut", "cre")))
Depends:
R (>= 4.1.0)
Imports:
Biobase,
BiocGenerics,
biomaRt,
Biostrings,
GenomeInfoDb,
GenomicFeatures,
GenomicRanges,
httr,
IRanges,
methods,
parallel,
Rsamtools,
RSQLite,
rtracklayer,
S4Vectors,
stringr,
utils
Suggests:
BiocManager,
BSgenome,
knitr,
RMySQL,
RUnit
Description: Provides an interface to build a unified database of
genomic annotations and their coordinates (gene, transcript
and exon levels). It is aimed to be used when simple
tab-delimited annotations (or simple GRanges objects) are
required instead of the more complex annotation Bioconductor
packages. Also useful when combinatorial annotation elements
are reuired, such as RefSeq coordinates with Ensembl biotypes.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: false
URL: http://www.fleming.gr
biocViews: Software, WorkflowStep, RNASeq, Transcription, Sequencing,
Transcriptomics, BiomedicalInformatics, FunctionalGenomics,
SystemsBiology, AlternativeSplicing, DataImport, ChIPSeq
VignetteBuilder: knitr
BugReports: https://github.com/pmoulos/sitadela/issues
Version: 0.99.0
Date: 2021-02-19
Collate:
'argcheck.R'
'external.R'
'fromtxdb.R'
'query-ensembl.R'
'query-ncbi.R'
'query-refseq.R'
'query-ucsc.R'
'query-utils.R'
'reduceops.R'
'sitadela.R'
'testfuns.R'
'zzz.R'
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Panagiotis, @pmoulos
Thank you for your submission to Bioconductor. Please see the review below.
Best, Marcel
BiocManager
should not be a suggested package.stringr
may be a dependency you can avoid.BiocIO::import
will replace rtracklayer::import
@Kayla-Morrellexample-4
and others work on all platforms?R.utils
for functions such as gunzip
rather than
using system
commands.BiocManager::install
code in loadBsGenome
and let the user
install the package themselves..checkNumArgs
. The only difference in numeric
and integer
switches
is the text "numeric" and "integer" in the error message. The look up table
can be a list
(roughly):argBounds <- c(5, 7)
gt <- list(op = function(x) x <= argBounds[1], cls = class,
desc = function(x) "greater than")
lapply(gt, function(f) f(3))
# $op
# [1] TRUE
#
# $cls
# [1] "numeric"
R/external.R
)writeLines
or print
in package code. If you need to show
a message, use message
..myGetBM
should indicate that sitadela
should be tagged when reporting at https://support.bioconductor.org.xml
package or another package?ann[, "score"]
rather than ann[, 1]
(from
R/fromtxdb.R
; and avoid repeating this code if possible) ann <- as.data.frame(unname(gr))
ann <- ann[,c(1,2,3,7,9,5,10,11)]
names(ann)[c(1,4)] <- c("chromosome","exon_id")
ann$chromosome <- as.character(ann$chromosome)
ann <- ann[order(ann$chromosome,ann$start),]
R/query-ensembl.R
..makeSumTranscriptExonFromTxDb
. Perhaps there is a way to abstract and
modularize the functionality so that you can apply different modules across
related functions.Minor:
ncol(values == 1)
different from ncol(values)
when values
is a
data.frame
? (.myGenerateFilterXML
)R/fromtxdb.R
can become
map[, !names(map) %in% c("exon_id", "transcript_id"), drop = FALSE]
.Received a valid push; starting a build. Commits are:
0a7ad1b Version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Marcel, @LiNk-NY,
Thank you for the extensive review. I addressed most of the issues. The package though while building and checking normally locally, it failed on the build system, it didn't even clone the repo, is there a problem currently?
Below, please find my actions regarding the package review:
Hi Panagiotis, @pmoulos
Thank you for your submission to Bioconductor. Please see the review below.
Best, Marcel
sitadela #1900
DESCRIPTION
- Update the package URL to be either the packages GitHub repository or a dedicated webpage for introducing new users the package and not an institutional page.
Done.
BiocManager
should not be a suggested package.
The reason for this, is that the package includes a function for calculating
GC-content for the supported genomes (getGcContent
), line
502 in sitadela.R
. When
a BSgenome
package is not available, it is installed automatically using
BiocManager
. Would such a functionality be supported if I remove it from the
suggested packages? Will I get check
warnings?
NAMESPACE
- Looks good.
stringr
may be a dependency you can avoid.
Done.
- If I'm not mistaken,
BiocIO::import
will replacertracklayer::import
@Kayla-Morrell
What should I do for now? Should I leave import
as is?
vignettes
- There is quite a bit of platform dependent code. Why wouldn't
example-4
and others work on all platforms?
Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")
in R-devel for Windows, this is why I made it platform dependent for the time being at least.
- Use R utilities like
R.utils
for functions such asgunzip
rather than usingsystem
commands.
The only system
command has to do with the file conversion process mentioned
above, occuring when retrieving 3' UTR regions from UCSC databases.
- You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.
I wanted to transform example-5
and example-6
like example-4
but didn't
do because of the previous error (GZFile
). Using gunzip
from R.tools
to
manually unzip the file before importing, returns a similar error with the
GFF2File
class (in Windows).
R
- Remove the
BiocManager::install
code inloadBsGenome
and let the user install the package themselves.
See above for a justification of this. My purpose is to provide some way of
automation, especially for building a complete annotation database (see also
example-8
from the vignette). If this is not a good behavior, please confirm
as instructions on building a complete database should change.
- Avoid over-using switch and nesting switch calls by creating a lookup table for
.checkNumArgs
. The only difference innumeric
andinteger
switches is the text "numeric" and "integer" in the error message. The look up table can be alist
(roughly):argBounds <- c(5, 7) gt <- list(op = function(x) x <= argBounds[1], cls = class, desc = function(x) "greater than") lapply(gt, function(f) f(3)) # $op # [1] TRUE # # $cls # [1] "numeric"
Done.
- Avoid nested for loops where possible (
R/external.R
)- Do not use
writeLines
ormessage
.
These are leftovers from some biomaRt
functions I had to modify to handle
httr
timeouts. I corrected these. Generally, the R file external.R
contains
such functions, borrowed and slighlty modified from the biomaRt
package, as
they were either not exported, or not offering the parameters I had to change as
arguments. Lines where message
is meaningless (e.g. when a table must be
displayed) have been commented out.
- Your error messages in
.myGetBM
should indicate thatsitadela
should be tagged when reporting at https://support.bioconductor.org.
Fixed.
- The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the
xml
package or another package?
Are you referring to the functions in external.R
or generally in the package?
If the former, these are borrowed functions from biomaRt
. See above on the
justification for using them like this.
- Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g.,
ann[, "score"]
rather thanann[, 1]
(fromR/fromtxdb.R
; and avoid repeating this code if possible)
Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...
- Minimize the amount of copy-and-paste code by creating a helper function and reusing it. Here is one block of code that gets repeated quite often:
ann <- as.data.frame(unname(gr)) ann <- ann[,c(1,2,3,7,9,5,10,11)] names(ann)[c(1,4)] <- c("chromosome","exon_id") ann$chromosome <- as.character(ann$chromosome) ann <- ann[order(ann$chromosome,ann$start),]
- See also
R/query-ensembl.R
.
I tried to further modularize the functions in fromtxdb.R
by trying to avoid
code repetition as much as possible.
- Find ways to avoid the cyclomatic complexity in the
.makeSumTranscriptExonFromTxDb
. Perhaps there is a way to abstract and modularize the functionality so that you can apply different modules across related functions.
I tried to modularize further but with respect to that particular one, it is
essentially an expansion to the data created from .makeSumTranscriptExonFromTxDb
(other *Sum*
functions also changed like this to avoid code repetition). This
why .makeTranscriptExonFromTxDb
is called first, to make all checks and create
the data, and then .makeSumTranscriptExonFromTxDb
restructures it.
Minor:
- How is
ncol(values == 1)
different fromncol(values)
whenvalues
is adata.frame
? (.myGenerateFilterXML
)
Leftover from biomaRt
function. Fixed.
- Switching from logical to integer index is not always useful, e.g., the lines 8-11 in
R/fromtxdb.R
can becomemap[, !names(map) %in% c("exon_id", "transcript_id"), drop = FALSE]
.
Fixed.
tests
- Add more tests in general and avoid creating system specific tests
Done. Regarding the system specific test test_sitadela_gtf
, see above about
the GZfFile
error.
Thanks again.
Hi Panagiotis, @pmoulos
Thank you for making those changes. Please see my responses below.
BiocManager
should not be a suggested package.The reason for this, is that the package includes a function for calculating GC-content for the supported genomes (
getGcContent
), line 502 insitadela.R
. When aBSgenome
package is not available, it is installed automatically usingBiocManager
. Would such a functionality be supported if I remove it from the suggested packages? Will I getcheck
warnings?
It is bad practice to install a package for the user. You do not need this functionality.
You do need a conditional test if (!requireNamespace("pkg", quietly = TRUE)) stop(...)
to make sure that the user has the package installed.
- If I'm not mistaken,
BiocIO::import
will replacertracklayer::import
@Kayla-MorrellWhat should I do for now? Should I leave
import
as is?
Any thoughts @Kayla-Morrell ?
vignettes
- There is quite a bit of platform dependent code. Why wouldn't
example-4
and others work on all platforms?Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error
Error in h(simpleError(msg, call)) : error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")
in R-devel for Windows, this is why I made it platform dependent for the time being at least.
Keep in mind that about half of the Bioconductor user base is on Windows. We strongly recommend that you provide functionality for all platforms. Otherwise, you could mark your package as unsupported on Windows.
- Use R utilities like
R.utils
for functions such asgunzip
rather than usingsystem
commands.The only
system
command has to do with the file conversion process mentioned above, occuring when retrieving 3' UTR regions from UCSC databases.
We highly recommend that you use native R functions rather than system commands because these reduce any platform dependencies in your code and make your package more portable. It may also be the case that you do not need the entirety of the workflow but only the data at certain steps which you can save as small datasets in the package to demonstrate the functionality in the vignette.
- You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.
I wanted to transform
example-5
andexample-6
likeexample-4
but didn't do because of the previous error (GZFile
). Usinggunzip
fromR.tools
to manually unzip the file before importing, returns a similar error with theGFF2File
class (in Windows).
I am not familiar with this issue on Windows. My suggestion again would be
to host a small dataset in the package so that you don't have to manually
unzip files and avoid system
commands.
R
- Remove the
BiocManager::install
code inloadBsGenome
and let the user install the package themselves.See above for a justification of this. My purpose is to provide some way of automation, especially for building a complete annotation database (see also
example-8
from the vignette). If this is not a good behavior, please confirm as instructions on building a complete database should change.
I can see that a step-by-step of this can be helpful. If you'd like to demonstrate a complete annotation database build please submit that as a workflow package instead.
- Avoid over-using switch and nesting switch calls by creating a lookup table for
.checkNumArgs
. The only difference innumeric
andinteger
switches is the text "numeric" and "integer" in the error message. The look up table can be alist
(roughly):argBounds <- c(5, 7) gt <- list(op = function(x) x <= argBounds[1], cls = class, desc = function(x) "greater than") lapply(gt, function(f) f(3)) # $op # [1] TRUE # # $cls # [1] "numeric"
Done.
- Avoid nested for loops where possible (
R/external.R
)- Do not use
writeLines
ormessage
.These are leftovers from some
biomaRt
functions I had to modify to handlehttr
timeouts. I corrected these. Generally, the R fileexternal.R
contains such functions, borrowed and slighlty modified from thebiomaRt
package, as they were either not exported, or not offering the parameters I had to change as arguments. Lines wheremessage
is meaningless (e.g. when a table must be displayed) have been commented out.
I don't think this is good practice. If you are providing improvements and bug
fixes to biomaRt
, you should do so via Pull Request to their repository
(https://github.com/grimbough/biomaRt). I would only copy code over after
all other options have been exhausted and after you have authorization from
the package maintainer. As a maintainer, it is usually better to receive a pull
request for an improvement rather than have your code copied over to
someone else's package.
- The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the
xml
package or another package?Are you referring to the functions in
external.R
or generally in the package? If the former, these are borrowed functions frombiomaRt
. See above on the justification for using them like this.
See above.
- Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g.,
ann[, "score"]
rather thanann[, 1]
(fromR/fromtxdb.R
; and avoid repeating this code if possible)Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...
Perhaps add unit tests to keep the code robust in these situations.
Thanks! -Marcel
It seems like builds are still being triggered through github instead of git which is causing the cloning issue. Do you have a webhook set up on your repository (the old way of triggering builds?) If so this should be removed and you should follow the process of pushing to git.bioconductor.org setting remotes
@pmoulos and @LiNk-NY - Sorry for the late reply,BiocIO::import
has replaced rtracklayer::import
so you should be using that instead.
Best, Kayla
Thanks Kayla and Lori!
Please remember to push to git.bioconductor.org to trigger a new build
Hi Marcel, @LiNk-NY
Please see some more changes below:
Hi Panagiotis, @pmoulos
Thank you for making those changes. Please see my responses below.
BiocManager
should not be a suggested package.The reason for this, is that the package includes a function for calculating GC-content for the supported genomes (
getGcContent
), line 502 insitadela.R
. When aBSgenome
package is not available, it is installed automatically usingBiocManager
. Would such a functionality be supported if I remove it from the suggested packages? Will I getcheck
warnings?It is bad practice to install a package for the user. You do not need this functionality. You do need a conditional test
if (!requireNamespace("pkg", quietly = TRUE)) stop(...)
to make sure that the user has the package installed.
Thank you. Done.
- If I'm not mistaken,
BiocIO::import
will replacertracklayer::import
@Kayla-MorrellWhat should I do for now? Should I leave
import
as is?Any thoughts @Kayla-Morrell ?
Thank you @Kayla-Morrell. It has been replaced. However, I am a bit confused:
In Linux (R-devel):
library(BiocIO)
gtf <- file.path(system.file(package="sitadela","extdata",
"gadMor1_HE567025.gtf.gz"))
import(gtf)
>Error in FileForFormat(resource(con), fileFormat(con)) :
> Format 'gtf' unsupported
showMethods("import")
>Function: import (package BiocIO)
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
> (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="GZFile", format="missing", text="missing"
> (inherited from: con="CompressedFile", format="missing", >text="ANY")
>con="missing", format="ANY", text="character"
library(rtracklayer)
import(gtf) # Works
showMethods("import")
showMethods("import")
>Function: import (package BiocIO)
>con="BamFile", format="ANY", text="ANY"
>con="BED15File", format="ANY", text="ANY"
>con="BEDFile", format="ANY", text="ANY"
>con="BEDPEFile", format="ANY", text="ANY"
>con="BigBedFile", format="ANY", text="ANY"
>con="BigWigFile", format="ANY", text="ANY"
>con="BroadPeakFile", format="ANY", text="ANY"
>con="ChainFile", format="ANY", text="ANY"
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
> (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="FastaFile", format="ANY", text="ANY"
>con="GFFFile", format="ANY", text="ANY"
>con="GTFFile", format="missing", text="missing"
> (inherited from: con="GFFFile", format="ANY", text="ANY")
>con="GZFile", format="missing", text="missing"
> (inherited from: con="CompressedFile", format="missing", text="ANY")
>con="missing", format="ANY", text="character"
>con="NarrowPeakFile", format="ANY", text="ANY"
>con="TabixFile", format="ANY", text="ANY"
>con="TwoBitFile", format="ANY", text="ANY"
>con="UCSCFile", format="ANY", text="ANY"
>con="WIGFile", format="ANY", text="ANY"
Do I have to re-implement method for importing GTF files when
using BiocIO
? I am sure that I am missing something here,
apologies...
In Windows (R-devel):
library(BiocIO)
gtf <- file.path(system.file(package="sitadela","extdata",
"gadMor1_HE567025.gtf.gz"))
import(gtf)
>Error in FileForFormat(resource(con), fileFormat(con)) :
> Format 'gtf' unsupported
showMethods("import")
>Function: import (package BiocIO)
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
> (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="GZFile", format="missing", text="missing"
> (inherited from: con="CompressedFile", format="missing", text="ANY")
>con="missing", format="ANY", text="character"
library(rtracklayer)
import(gtf)
>Found more than one class "CompressedFile" in cache; using the >first, from namespace 'BiocIO'
>Also defined by ‘rtracklayer’
>Error in h(simpleError(msg, call)) :
> error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")
showMethods("import")
>Function: import (package rtracklayer)
>con="BamFile", format="ANY", text="ANY"
>con="BED15File", format="ANY", text="ANY"
>con="BEDFile", format="ANY", text="ANY"
>con="BEDPEFile", format="ANY", text="ANY"
>con="BigWigFile", format="ANY", text="ANY"
>con="BroadPeakFile", format="ANY", text="ANY"
>con="ChainFile", format="ANY", text="ANY"
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
> (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="FastaFile", format="ANY", text="ANY"
>con="GFFFile", format="ANY", text="ANY"
>con="missing", format="ANY", text="character"
>con="NarrowPeakFile", format="ANY", text="ANY"
>con="TabixFile", format="ANY", text="ANY"
>con="TwoBitFile", format="ANY", text="ANY"
>con="UCSCFile", format="ANY", text="ANY"
>con="WIGFile", format="ANY", text="ANY"
Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12.
Using R.utils
for any compressed file preprocessing does not fix this.
I would really appreciate any advice on what I am missing... This is also related to the comments about platform independent examples below. If this is fixed, then platform independency is also fixed.
vignettes
- There is quite a bit of platform dependent code. Why wouldn't
example-4
and others work on all platforms?Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error
Error in h(simpleError(msg, call)) : error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")
in R-devel for Windows, this is why I made it platform dependent for the time being at least.
Keep in mind that about half of the Bioconductor user base is on Windows. We strongly recommend that you provide functionality for all platforms. Otherwise, you could mark your package as unsupported on Windows.
See above.
- Use R utilities like
R.utils
for functions such asgunzip
rather than usingsystem
commands.The only
system
command has to do with the file conversion process mentioned above, occuring when retrieving 3' UTR regions from UCSC databases.We highly recommend that you use native R functions rather than system commands because these reduce any platform dependencies in your code and make your package more portable. It may also be the case that you do not need the entirety of the workflow but only the data at certain steps which you can save as small datasets in the package to demonstrate the functionality in the vignette.
- You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.
I wanted to transform
example-5
andexample-6
likeexample-4
but didn't do because of the previous error (GZFile
). Usinggunzip
fromR.tools
to manually unzip the file before importing, returns a similar error with theGFF2File
class (in Windows).I am not familiar with this issue on Windows. My suggestion again would be to host a small dataset in the package so that you don't have to manually unzip files and avoid
system
commands.R
- Remove the
BiocManager::install
code inloadBsGenome
and let the user install the package themselves.See above for a justification of this. My purpose is to provide some way of automation, especially for building a complete annotation database (see also
example-8
from the vignette). If this is not a good behavior, please confirm as instructions on building a complete database should change.I can see that a step-by-step of this can be helpful. If you'd like to demonstrate a complete annotation database build please submit that as a workflow package instead.
OK, thanks for looking at this. I removed this functionality and implemented a
check to inform the user about required BSgenome
packages. I will also
consider the workflow case.
- Avoid over-using switch and nesting switch calls by creating a lookup table for
.checkNumArgs
. The only difference innumeric
andinteger
switches is the text "numeric" and "integer" in the error message. The look up table can be alist
(roughly):argBounds <- c(5, 7) gt <- list(op = function(x) x <= argBounds[1], cls = class, desc = function(x) "greater than") lapply(gt, function(f) f(3)) # $op # [1] TRUE # # $cls # [1] "numeric"
Done.
- Avoid nested for loops where possible (
R/external.R
)- Do not use
writeLines
ormessage
.These are leftovers from some
biomaRt
functions I had to modify to handlehttr
timeouts. I corrected these. Generally, the R fileexternal.R
contains such functions, borrowed and slighlty modified from thebiomaRt
package, as they were either not exported, or not offering the parameters I had to change as arguments. Lines wheremessage
is meaningless (e.g. when a table must be displayed) have been commented out.I don't think this is good practice. If you are providing improvements and bug fixes to
biomaRt
, you should do so via Pull Request to their repository (https://github.com/grimbough/biomaRt). I would only copy code over after all other options have been exhausted and after you have authorization from the package maintainer. As a maintainer, it is usually better to receive a pull request for an improvement rather than have your code copied over to someone else's package.
Thanks for the comments and suggestions. I did a bit more research on this and
came up upon this by biomaRt
author. So I implemented this solution which seems to work ok and the collated
file external.R
is not needed any more.
- The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the
xml
package or another package?Are you referring to the functions in
external.R
or generally in the package? If the former, these are borrowed functions frombiomaRt
. See above on the justification for using them like this.See above.
- Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g.,
ann[, "score"]
rather thanann[, 1]
(fromR/fromtxdb.R
; and avoid repeating this code if possible)Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...
Perhaps add unit tests to keep the code robust in these situations.
Thanks! -Marcel
Finally, after following instructions here I tried to push the package also upstream, and received the following:
git push upstream main
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 79, done.
Counting objects: 100% (79/79), done.
Delta compression using up to 16 threads
Compressing objects: 100% (56/56), done.
Writing objects: 100% (59/59), 13.05 KiB | 835.00 KiB/s, done.
Total 59 (delta 41), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'
Thank you all for your help,
Panagiotis
Please remove any webhooks you have set up on your github repository. Please also remember to push to git.bioconductor.org to trigger build changes. It is important to practice this concept as this will be the way to update your package upon acceptance as well. Be sure to Activate your git credentials account if you have not done so already. And please view this git help page for pushing to git.bioconductor.org as upstream remote.
Hi Lori, @lshep
Thanks for the advice. However, I already removed the previous hook and tried everything regarding the new package workflow. My credentials account are active as I maintain other packages too. Specifically, I tried:
$ git pull
Already up to date.
git remote add upstream git@git.bioconductor.org:packages/sitadela.git
# OK
$ git remote -v
origin https://github.com/pmoulos/sitadela.git (fetch)
origin https://github.com/pmoulos/sitadela.git (push)
upstream git@git.bioconductor.org:packages/sitadela.git (fetch)
upstream git@git.bioconductor.org:packages/sitadela.git (push)
$ git fetch --all
Fetching origin
Fetching upstream
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
# OK
* [new branch] master -> upstream/master
$ git merge upstream/master
Already up to date.
$ git merge origin/main
Already up to date.
There is nothing new to commit at this point but changed the date in DESCRITPION file so as to fully comply with the instructions. Fun begins...
$ git commit -a
[main 6bd44bc] Date in DESCRIPTION
1 file changed, 1 insertion(+), 1 deletion(-)
$ git push upstream master
error: src refspec master does not match any
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'
$ git push upstream main
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 82, done.
Counting objects: 100% (82/82), done.
Delta compression using up to 16 threads
Compressing objects: 100% (59/59), done.
Writing objects: 100% (62/62), 13.31 KiB | 852.00 KiB/s, done.
Total 62 (delta 43), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'
$ git push origin main
Username for 'https://github.com': pmoulos
Password for 'https://pmoulos@github.com':
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 325 bytes | 325.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/pmoulos/sitadela.git
59596b2..6bd44bc main -> main
# OK with origin in GitHub
So essentially I am unable to push in upstream. Is it because the branch name in origin is main
while in upstream master
?
$ git show-ref
6bd44bc42912f250d612e4306e2af26f89713034 refs/heads/main
6bd44bc42912f250d612e4306e2af26f89713034 refs/remotes/origin/HEAD
6bd44bc42912f250d612e4306e2af26f89713034 refs/remotes/origin/main
aa2fc53bfa78782c0b88cedb43dea3ef7071a9e9 refs/remotes/upstream/master
I even tried to add a remote (upstream
) with a different name for the main branch (after removing the first one):
git remote add -m main upstream git@git.bioconductor.org:packages/sitadela.git
Still nothing... Any help would be greatly appreciated, thanks in advance.
Panagiotis
@nturaga could you assist with the issue with main/master and pushing to upstream master?
Hi,
There is no branch called main
in your upstream
remote. It is only called master
.
If git push upstream master
isn't going through, it means your SSH keys aren't set up properly.
Please check http://bioconductor.org/developers/how-to/git/faq/
Hi @nturaga, thanks for the advice.
I have tried everything...
How can this be explained though?
$ git push upstream master
error: src refspec master does not match any
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'
Push process not even initiated... While
git push upstream main
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 82, done.
Counting objects: 100% (82/82), done.
Delta compression using up to 16 threads
Compressing objects: 100% (59/59), done.
Writing objects: 100% (62/62), 13.31 KiB | 801.00 KiB/s, done.
Total 62 (delta 43), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'
While this seems to initiate the procedure but fails...
Should I completely change SSH keys? Add more?
Thanks and apologies for inconvenience...
Yes, try adding a new SSH key. This is not an issue on our side of things. It's on your end.
Received a valid push on git.bioconductor.org; starting a build for commit id: 6bd44bc42912f250d612e4306e2af26f89713034
Hi @nturaga , @lshep
Thank you for your help. Solved based on this.
$ git push upstream main:master
Apparently, not everything is clear with GitHub having changed master
to main
. Maybe you could consider adding this hint to the new package workflow.
Thanks again,
Panagiotis
Hi Panagiotis, @pmoulos
That is a quick fix but what you should be doing is
git checkout -b master
and pushing to Bioconductor from there.
We are not supporting pushes from the main
branch yet though this may change in the future and we may require a differently named branch.
Best, Marcel
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Panagiotis, @pmoulos
Do I have to re-implement method for importing GTF files when using
BiocIO
? I am sure that I am missing something here, apologies...
It looks like the method lives in rtracklayer
so use it from there.
AFAIU BiocIO
exports the generic. Unless you're creating a new method,
you don't have to import BiocIO
.
Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12. Using R.utils for any compressed file preprocessing does not fix this.
For your package, the current release is not relevant because it will only exist in 3.13 and above. Are you saying it's not working on Bioconductor 3.13? Perhaps you can create an issue in the package where it fails? I'm also tagging @lawremi in case he'd like to comment on supporting GTF imports on Windows in Bioc devel.
Best, Marcel
Received a valid push on git.bioconductor.org; starting a build for commit id: e9c368f241536e7bce6d61a27a5ca1f57b887d8d
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @LiNk-NY, Marcel,
Thanks for your comments.
Do I have to re-implement method for importing GTF files when using
BiocIO
? I am sure that I am missing something here, apologies...
OK, returned to rtracklayer
.
It looks like the method lives in
rtracklayer
so use it from there. AFAIUBiocIO
exports the generic. Unless you're creating a new method, you don't have to importBiocIO
.Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12. Using R.utils for any compressed file preprocessing does not fix this.
For your package, the current release is not relevant because it will only exist in 3.13 and above. Are you saying it's not working on Bioconductor 3.13? Perhaps you can create an issue in the package where it fails? I'm also tagging @lawremi in case he'd like to comment on supporting GTF imports on Windows in Bioc devel.
I updated rtracklayer
dev version. It works now and all executed examples work on all platforms.
Thanks again.
Panagiotis
Hi Panagiotis, @pmoulos
Thanks for making those changes. Please remove commented code from the vignette.
We don't encourage eval=FALSE
code chunks because the code can become
stale quite quickly.
Please update these and your package is ready for inclusion to Bioconductor.
Thank you!
Best, Marcel
Received a valid push on git.bioconductor.org; starting a build for commit id: 33a8b196e3296548db215ad779190672d72ec41c
Hi Marcel, @LiNk-NY
I removed the commented code. I included these examples in the README
file in the package GitHub page instead. Regarding the eval=FALSE
chunks, I left only two of these just for demonstration purposes:
eval=FALSE
as it will take a long time to run.If these are considered against good practice, I will remove them too.
Thanks for all your help.
Panagiotis
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thanks for your submission Panagiotis @pmoulos. Your package has been accepted. Best regards, Marcel
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Hi Panagiotis, @pmoulos
It looks like the location of the DB file could be handled in a better way. Perhaps you can have an Environment Variable / R option that sets the permanent location for this DB and have the user accept the default location via a prompt. Perhaps you could use
> tools::R_user_dir("sitadela", "data")
[1] "/home/user/.local/share/R/sitadela"
as the location after prompting and getting confirmation from the user. The user should absolutely be aware that a database is being created in their files.
See here for an example on how to cache a file like this : https://github.com/Bioconductor/GenomicDataCommons/blob/master/R/caching.R and https://github.com/waldronlab/cBioPortalData/blob/f1db835193430accdec1902645366a57017c9231/R/cache.R#L86-L112
Update: One of our core team members pointed out the following: Why isn't loadAnnotation() using the same default as addAnnotation() for the db argument? All functions with the db argument should probably use the same default (e.g. getDbPath()).
Received a valid push on git.bioconductor.org; starting a build for commit id: cd7a4a44cad45ec4052cfceb39ac115f6fecb39c
Hi Marcel, @LiNk-NY
Thanks for pointing out these issues.
I corrected the issue with the db
argument in loadAnnotation
.
Also, I implemented your suggested solutions regarding the location of the database and user confirmation. Briefly, the following logic is implemented through the following functions:
setDbPath <- function(db=NULL) {
if (is.null(db))
db <- .defaultDbPath()
else {
if (!is.character(db)) {
warning("The path to the sitadela database must be a valid path ",
"to a file which\nwill be created if not existing! Assuming ",
"default...",immediate.=TRUE)
db <- .defaultDbPath()
}
}
if (is.null(getOption("BioC"))) {
BioC <- list()
class(BioC) <- "BioCOptions"
options("BioC"=BioC)
}
BioC <- getOption("BioC")
BioC$sitadela <- db
options("BioC"=BioC)
}
getDbPath <- function() {
db <- getOption("BioC")$sitadela
if (!is.null(db) && is.character(db)) {
if (file.exists(db))
return(db)
else
return(.defaultDbPath())
}
else
return(.defaultDbPath())
}
.defaultDbPath <- function() {
# We end up here if option for sitadela database path does not exist or has
# not been provided by the user
defPath <- .getDefaultDbPath()
# User conformation only if database path not provided explicitly by the
# user. If provided explicitly, user knows a database is going to be created
# at the user-specified path.
if (!interactive())
.checkDbPath(defPath)
else {
if (!dir.exists(dirname(defPath))) {
confirm <- menu(c("Yes","No"),
title=sprintf(paste0("The sitadela database is going to be ",
"created at %s. Do you agree?"),defPath))
if (confirm == 1)
.checkDbPath()
else
stop(paste0("The sitadela database cannot be created without user ",
"agreement.\nPlease agree or use the 'db' argument in the ",
"'addAnnotation' function."))
}
}
return(defPath)
}
.checkDbPath <- function(db=NULL) {
if (missing(db) || is.null(db))
db <- .getDefaultDbPath()
dbd <- dirname(db)
if (!dir.exists(dbd)) {
message("sitadela database created at ",dbd," directory")
dir.create(dbd,recursive=TRUE,mode="0755",showWarnings=FALSE)
}
else
message("sitadela database found at ",dbd," directory")
}
.getDefaultDbPath <- function() {
return(file.path(tools::R_user_dir("sitadela","data"),"annotation.sqlite"))
}
Finally, on package load (zzz.R
):
.onLoad <- function(libname,pkgname) {
setDbPath()
}
Please let me know of any further corrective actions.
Best regards,
Panagiotis
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Marcel, @LiNk-NY,
Is there any progress on reviewing the latest changes on the database path? Should I perform any further corrective actions?
Thanks in advance,
Panagiotis
Hi Panagiotis, @pmoulos Sorry for the delay. Please do not use a vague name for your option ("BioC"). Be more specific to your package, perhaps "sitadela_cache" or similar for the option name. Thanks!
Received a valid push on git.bioconductor.org; starting a build for commit id: 21af9acc062101a031156469741350bdc129cbbc
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sitadela
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi Marcel, @LiNk-NY,
I changed the option name to sitadela_dbpath
. Now, it looks like:
setDbPath <- function(db=NULL) {
if (is.null(db))
db <- .defaultDbPath()
else {
if (!is.character(db)) {
warning("The path to the sitadela database must be a valid path ",
"to a file which\nwill be created if not existing! Assuming ",
"default...",immediate.=TRUE)
db <- .defaultDbPath()
}
}
options("sitadela_dbpath"=db)
}
getDbPath <- function() {
db <- getOption("sitadela_dbpath")
if (!is.null(db) && is.character(db)) {
if (file.exists(db))
return(db)
else
return(.defaultDbPath())
}
else
return(.defaultDbPath())
}
Best,
Panagiotis
Hi Panagiotis, @pmoulos Thanks for making that change. It looks good. Best, Marcel
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/pmoulos.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("sitadela")
. The package 'landing page' will be created at
https://bioconductor.org/packages/sitadela
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.