Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

sitadela #1900

Closed pmoulos closed 3 years ago

pmoulos commented 3 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 3 years ago

Hi @pmoulos

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: sitadela
Type: Package
Title: An R package for the easy provision of simple but
    complete tab-delimited genomic annotation from a
    variety of sources and organisms
Authors@R: c(person(given="Panagiotis", family="Moulos",
    email="moulos@fleming.gr", role=c("aut", "cre")))
Depends:
    R (>= 4.1.0)
Imports:
    Biobase,
    BiocGenerics,
    biomaRt,
    Biostrings,
    GenomeInfoDb,
    GenomicFeatures,
    GenomicRanges,
    httr,
    IRanges,
    methods,
    parallel,
    Rsamtools,
    RSQLite,
    rtracklayer,
    S4Vectors,
    stringr,
    utils
Suggests:
    BiocManager,
    BSgenome,
    knitr,
    RMySQL,
    RUnit
Description: Provides an interface to build a unified database of
    genomic annotations and their coordinates (gene, transcript
    and exon levels). It is aimed to be used when simple 
    tab-delimited annotations (or simple GRanges objects) are 
    required instead of the more complex annotation Bioconductor 
    packages. Also useful when combinatorial annotation elements
    are reuired, such as RefSeq coordinates with Ensembl biotypes.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: false
URL: http://www.fleming.gr
biocViews: Software, WorkflowStep, RNASeq, Transcription, Sequencing,
    Transcriptomics, BiomedicalInformatics, FunctionalGenomics, 
    SystemsBiology, AlternativeSplicing, DataImport, ChIPSeq
VignetteBuilder: knitr
BugReports: https://github.com/pmoulos/sitadela/issues
Version: 0.99.0
Date: 2021-02-19
Collate:
    'argcheck.R'
    'external.R'
    'fromtxdb.R'
    'query-ensembl.R'
    'query-ncbi.R'
    'query-refseq.R'
    'query-ucsc.R'
    'query-utils.R'
    'reduceops.R'
    'sitadela.R'
    'testfuns.R'
    'zzz.R'
bioc-issue-bot commented 3 years ago

Dear @pmoulos ,

You (or someone) has already posted that repository to our tracker.

See https://github.com/Bioconductor/Contributions/issues/1900

You cannot post the same repository more than once.

If you would like this repository to be linked to issue number: 1900, Please contact a Bioconductor Core Member. I am closing this issue.

lshep commented 3 years ago

Please ignore the previous comment concerning the issue being previously posted. I was running updates and inadvertently caused this message

bioc-issue-bot commented 3 years ago

Hi @pmoulos

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: sitadela
Type: Package
Title: An R package for the easy provision of simple but
    complete tab-delimited genomic annotation from a
    variety of sources and organisms
Authors@R: c(person(given="Panagiotis", family="Moulos",
    email="moulos@fleming.gr", role=c("aut", "cre")))
Depends:
    R (>= 4.1.0)
Imports:
    Biobase,
    BiocGenerics,
    biomaRt,
    Biostrings,
    GenomeInfoDb,
    GenomicFeatures,
    GenomicRanges,
    httr,
    IRanges,
    methods,
    parallel,
    Rsamtools,
    RSQLite,
    rtracklayer,
    S4Vectors,
    stringr,
    utils
Suggests:
    BiocManager,
    BSgenome,
    knitr,
    RMySQL,
    RUnit
Description: Provides an interface to build a unified database of
    genomic annotations and their coordinates (gene, transcript
    and exon levels). It is aimed to be used when simple 
    tab-delimited annotations (or simple GRanges objects) are 
    required instead of the more complex annotation Bioconductor 
    packages. Also useful when combinatorial annotation elements
    are reuired, such as RefSeq coordinates with Ensembl biotypes.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: false
URL: http://www.fleming.gr
biocViews: Software, WorkflowStep, RNASeq, Transcription, Sequencing,
    Transcriptomics, BiomedicalInformatics, FunctionalGenomics, 
    SystemsBiology, AlternativeSplicing, DataImport, ChIPSeq
VignetteBuilder: knitr
BugReports: https://github.com/pmoulos/sitadela/issues
Version: 0.99.0
Date: 2021-02-19
Collate:
    'argcheck.R'
    'external.R'
    'fromtxdb.R'
    'query-ensembl.R'
    'query-ncbi.R'
    'query-refseq.R'
    'query-ucsc.R'
    'query-utils.R'
    'reduceops.R'
    'sitadela.R'
    'testfuns.R'
    'zzz.R'
bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

Thank you for your submission to Bioconductor. Please see the review below.

Best, Marcel


sitadela #1900

DESCRIPTION

NAMESPACE

vignettes

R

argBounds <- c(5, 7)
gt <- list(op = function(x) x <= argBounds[1], cls = class,
    desc = function(x) "greater than")
lapply(gt, function(f) f(3))
# $op
# [1] TRUE
#
# $cls
# [1] "numeric"
    ann <- as.data.frame(unname(gr))
    ann <- ann[,c(1,2,3,7,9,5,10,11)]
    names(ann)[c(1,4)] <- c("chromosome","exon_id")
    ann$chromosome <- as.character(ann$chromosome)
    ann <- ann[order(ann$chromosome,ann$start),]

Minor:

tests

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

0a7ad1b Version bump

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ABNORMAL". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY,

Thank you for the extensive review. I addressed most of the issues. The package though while building and checking normally locally, it failed on the build system, it didn't even clone the repo, is there a problem currently?

Below, please find my actions regarding the package review:

Hi Panagiotis, @pmoulos

Thank you for your submission to Bioconductor. Please see the review below.

Best, Marcel

sitadela #1900

DESCRIPTION

  • Update the package URL to be either the packages GitHub repository or a dedicated webpage for introducing new users the package and not an institutional page.

Done.

  • BiocManager should not be a suggested package.

The reason for this, is that the package includes a function for calculating GC-content for the supported genomes (getGcContent), line 502 in sitadela.R. When a BSgenome package is not available, it is installed automatically using BiocManager. Would such a functionality be supported if I remove it from the suggested packages? Will I get check warnings?

NAMESPACE

  • Looks good.
  • stringr may be a dependency you can avoid.

Done.

  • If I'm not mistaken, BiocIO::import will replace rtracklayer::import @Kayla-Morrell

What should I do for now? Should I leave import as is?

vignettes

  • There is quite a bit of platform dependent code. Why wouldn't example-4 and others work on all platforms?

Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")

in R-devel for Windows, this is why I made it platform dependent for the time being at least.

  • Use R utilities like R.utils for functions such as gunzip rather than using system commands.

The only system command has to do with the file conversion process mentioned above, occuring when retrieving 3' UTR regions from UCSC databases.

  • You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.

I wanted to transform example-5 and example-6 like example-4 but didn't do because of the previous error (GZFile). Using gunzip from R.tools to manually unzip the file before importing, returns a similar error with the GFF2File class (in Windows).

R

  • Remove the BiocManager::install code in loadBsGenome and let the user install the package themselves.

See above for a justification of this. My purpose is to provide some way of automation, especially for building a complete annotation database (see also example-8 from the vignette). If this is not a good behavior, please confirm as instructions on building a complete database should change.

  • Avoid over-using switch and nesting switch calls by creating a lookup table for .checkNumArgs. The only difference in numeric and integer switches is the text "numeric" and "integer" in the error message. The look up table can be a list (roughly):
argBounds <- c(5, 7)
gt <- list(op = function(x) x <= argBounds[1], cls = class,
    desc = function(x) "greater than")
lapply(gt, function(f) f(3))
# $op
# [1] TRUE
#
# $cls
# [1] "numeric"

Done.

  • Avoid nested for loops where possible (R/external.R)
  • Do not use writeLines or print in package code. If you need to show a message, use message.

These are leftovers from some biomaRt functions I had to modify to handle httr timeouts. I corrected these. Generally, the R file external.R contains such functions, borrowed and slighlty modified from the biomaRt package, as they were either not exported, or not offering the parameters I had to change as arguments. Lines where message is meaningless (e.g. when a table must be displayed) have been commented out.

Fixed.

  • The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the xml package or another package?

Are you referring to the functions in external.R or generally in the package? If the former, these are borrowed functions from biomaRt. See above on the justification for using them like this.

  • Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g., ann[, "score"] rather than ann[, 1] (from R/fromtxdb.R; and avoid repeating this code if possible)

Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...

  • Minimize the amount of copy-and-paste code by creating a helper function and reusing it. Here is one block of code that gets repeated quite often:
    ann <- as.data.frame(unname(gr))
    ann <- ann[,c(1,2,3,7,9,5,10,11)]
    names(ann)[c(1,4)] <- c("chromosome","exon_id")
    ann$chromosome <- as.character(ann$chromosome)
    ann <- ann[order(ann$chromosome,ann$start),]
  • See also R/query-ensembl.R.

I tried to further modularize the functions in fromtxdb.R by trying to avoid code repetition as much as possible.

  • Find ways to avoid the cyclomatic complexity in the .makeSumTranscriptExonFromTxDb. Perhaps there is a way to abstract and modularize the functionality so that you can apply different modules across related functions.

I tried to modularize further but with respect to that particular one, it is essentially an expansion to the data created from .makeSumTranscriptExonFromTxDb (other *Sum* functions also changed like this to avoid code repetition). This why .makeTranscriptExonFromTxDb is called first, to make all checks and create the data, and then .makeSumTranscriptExonFromTxDb restructures it.

Minor:

  • How is ncol(values == 1) different from ncol(values) when values is a data.frame? (.myGenerateFilterXML)

Leftover from biomaRt function. Fixed.

  • Switching from logical to integer index is not always useful, e.g., the lines 8-11 in R/fromtxdb.R can become map[, !names(map) %in% c("exon_id", "transcript_id"), drop = FALSE].

Fixed.

tests

  • Add more tests in general and avoid creating system specific tests

Done. Regarding the system specific test test_sitadela_gtf, see above about the GZfFile error.

Thanks again.

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

Thank you for making those changes. Please see my responses below.

  • BiocManager should not be a suggested package.

The reason for this, is that the package includes a function for calculating GC-content for the supported genomes (getGcContent), line 502 in sitadela.R. When a BSgenome package is not available, it is installed automatically using BiocManager. Would such a functionality be supported if I remove it from the suggested packages? Will I get check warnings?

It is bad practice to install a package for the user. You do not need this functionality. You do need a conditional test if (!requireNamespace("pkg", quietly = TRUE)) stop(...) to make sure that the user has the package installed.

  • If I'm not mistaken, BiocIO::import will replace rtracklayer::import @Kayla-Morrell

What should I do for now? Should I leave import as is?

Any thoughts @Kayla-Morrell ?

vignettes

  • There is quite a bit of platform dependent code. Why wouldn't example-4 and others work on all platforms?

Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")

in R-devel for Windows, this is why I made it platform dependent for the time being at least.

Keep in mind that about half of the Bioconductor user base is on Windows. We strongly recommend that you provide functionality for all platforms. Otherwise, you could mark your package as unsupported on Windows.

  • Use R utilities like R.utils for functions such as gunzip rather than using system commands.

The only system command has to do with the file conversion process mentioned above, occuring when retrieving 3' UTR regions from UCSC databases.

We highly recommend that you use native R functions rather than system commands because these reduce any platform dependencies in your code and make your package more portable. It may also be the case that you do not need the entirety of the workflow but only the data at certain steps which you can save as small datasets in the package to demonstrate the functionality in the vignette.

  • You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.

I wanted to transform example-5 and example-6 like example-4 but didn't do because of the previous error (GZFile). Using gunzip from R.tools to manually unzip the file before importing, returns a similar error with the GFF2File class (in Windows).

I am not familiar with this issue on Windows. My suggestion again would be to host a small dataset in the package so that you don't have to manually unzip files and avoid system commands.

R

  • Remove the BiocManager::install code in loadBsGenome and let the user install the package themselves.

See above for a justification of this. My purpose is to provide some way of automation, especially for building a complete annotation database (see also example-8 from the vignette). If this is not a good behavior, please confirm as instructions on building a complete database should change.

I can see that a step-by-step of this can be helpful. If you'd like to demonstrate a complete annotation database build please submit that as a workflow package instead.

  • Avoid over-using switch and nesting switch calls by creating a lookup table for .checkNumArgs. The only difference in numeric and integer switches is the text "numeric" and "integer" in the error message. The look up table can be a list (roughly):
argBounds <- c(5, 7)
gt <- list(op = function(x) x <= argBounds[1], cls = class,
    desc = function(x) "greater than")
lapply(gt, function(f) f(3))
# $op
# [1] TRUE
#
# $cls
# [1] "numeric"

Done.

  • Avoid nested for loops where possible (R/external.R)
  • Do not use writeLines or print in package code. If you need to show a message, use message.

These are leftovers from some biomaRt functions I had to modify to handle httr timeouts. I corrected these. Generally, the R file external.R contains such functions, borrowed and slighlty modified from the biomaRt package, as they were either not exported, or not offering the parameters I had to change as arguments. Lines where message is meaningless (e.g. when a table must be displayed) have been commented out.

I don't think this is good practice. If you are providing improvements and bug fixes to biomaRt, you should do so via Pull Request to their repository (https://github.com/grimbough/biomaRt). I would only copy code over after all other options have been exhausted and after you have authorization from the package maintainer. As a maintainer, it is usually better to receive a pull request for an improvement rather than have your code copied over to someone else's package.

  • The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the xml package or another package?

Are you referring to the functions in external.R or generally in the package? If the former, these are borrowed functions from biomaRt. See above on the justification for using them like this.

See above.

  • Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g., ann[, "score"] rather than ann[, 1] (from R/fromtxdb.R; and avoid repeating this code if possible)

Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...

Perhaps add unit tests to keep the code robust in these situations.

Thanks! -Marcel

lshep commented 3 years ago

It seems like builds are still being triggered through github instead of git which is causing the cloning issue. Do you have a webhook set up on your repository (the old way of triggering builds?) If so this should be removed and you should follow the process of pushing to git.bioconductor.org setting remotes

Kayla-Morrell commented 3 years ago

@pmoulos and @LiNk-NY - Sorry for the late reply,BiocIO::import has replaced rtracklayer::import so you should be using that instead.

Best, Kayla

LiNk-NY commented 3 years ago

Thanks Kayla and Lori!

bioc-issue-bot commented 3 years ago

Please remember to push to git.bioconductor.org to trigger a new build

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY

Please see some more changes below:

Hi Panagiotis, @pmoulos

Thank you for making those changes. Please see my responses below.

  • BiocManager should not be a suggested package.

The reason for this, is that the package includes a function for calculating GC-content for the supported genomes (getGcContent), line 502 in sitadela.R. When a BSgenome package is not available, it is installed automatically using BiocManager. Would such a functionality be supported if I remove it from the suggested packages? Will I get check warnings?

It is bad practice to install a package for the user. You do not need this functionality. You do need a conditional test if (!requireNamespace("pkg", quietly = TRUE)) stop(...) to make sure that the user has the package installed.

Thank you. Done.

  • If I'm not mistaken, BiocIO::import will replace rtracklayer::import @Kayla-Morrell

What should I do for now? Should I leave import as is?

Any thoughts @Kayla-Morrell ?

Thank you @Kayla-Morrell. It has been replaced. However, I am a bit confused:

In Linux (R-devel):

library(BiocIO)

gtf <- file.path(system.file(package="sitadela","extdata",
    "gadMor1_HE567025.gtf.gz"))

import(gtf)

>Error in FileForFormat(resource(con), fileFormat(con)) :
>  Format 'gtf' unsupported

showMethods("import")

>Function: import (package BiocIO)
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
>    (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="GZFile", format="missing", text="missing"
>    (inherited from: con="CompressedFile", format="missing", >text="ANY")
>con="missing", format="ANY", text="character"

library(rtracklayer)

import(gtf) # Works

showMethods("import")

showMethods("import")
>Function: import (package BiocIO)
>con="BamFile", format="ANY", text="ANY"
>con="BED15File", format="ANY", text="ANY"
>con="BEDFile", format="ANY", text="ANY"
>con="BEDPEFile", format="ANY", text="ANY"
>con="BigBedFile", format="ANY", text="ANY"
>con="BigWigFile", format="ANY", text="ANY"
>con="BroadPeakFile", format="ANY", text="ANY"
>con="ChainFile", format="ANY", text="ANY"
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
>    (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="FastaFile", format="ANY", text="ANY"
>con="GFFFile", format="ANY", text="ANY"
>con="GTFFile", format="missing", text="missing"
>    (inherited from: con="GFFFile", format="ANY", text="ANY")
>con="GZFile", format="missing", text="missing"
>    (inherited from: con="CompressedFile", format="missing", text="ANY")
>con="missing", format="ANY", text="character"
>con="NarrowPeakFile", format="ANY", text="ANY"
>con="TabixFile", format="ANY", text="ANY"
>con="TwoBitFile", format="ANY", text="ANY"
>con="UCSCFile", format="ANY", text="ANY"
>con="WIGFile", format="ANY", text="ANY"

Do I have to re-implement method for importing GTF files when using BiocIO? I am sure that I am missing something here, apologies...

In Windows (R-devel):

library(BiocIO)

gtf <- file.path(system.file(package="sitadela","extdata",
    "gadMor1_HE567025.gtf.gz"))

import(gtf)

>Error in FileForFormat(resource(con), fileFormat(con)) : 
>  Format 'gtf' unsupported

showMethods("import")

>Function: import (package BiocIO)
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
>    (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="GZFile", format="missing", text="missing"
>    (inherited from: con="CompressedFile", format="missing", text="ANY")
>con="missing", format="ANY", text="character"

library(rtracklayer)

import(gtf)

>Found more than one class "CompressedFile" in cache; using the >first, from namespace 'BiocIO'
>Also defined by ‘rtracklayer’
>Error in h(simpleError(msg, call)) : 
>  error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")

showMethods("import")
>Function: import (package rtracklayer)
>con="BamFile", format="ANY", text="ANY"
>con="BED15File", format="ANY", text="ANY"
>con="BEDFile", format="ANY", text="ANY"
>con="BEDPEFile", format="ANY", text="ANY"
>con="BigWigFile", format="ANY", text="ANY"
>con="BroadPeakFile", format="ANY", text="ANY"
>con="ChainFile", format="ANY", text="ANY"
>con="character", format="character", text="ANY"
>con="character", format="missing", text="ANY"
>con="character", format="missing", text="missing"
>    (inherited from: con="character", format="missing", text="ANY")
>con="CompressedFile", format="missing", text="ANY"
>con="connection", format="character", text="ANY"
>con="connection", format="missing", text="ANY"
>con="FastaFile", format="ANY", text="ANY"
>con="GFFFile", format="ANY", text="ANY"
>con="missing", format="ANY", text="character"
>con="NarrowPeakFile", format="ANY", text="ANY"
>con="TabixFile", format="ANY", text="ANY"
>con="TwoBitFile", format="ANY", text="ANY"
>con="UCSCFile", format="ANY", text="ANY"
>con="WIGFile", format="ANY", text="ANY"

Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12. Using R.utils for any compressed file preprocessing does not fix this.

I would really appreciate any advice on what I am missing... This is also related to the comments about platform independent examples below. If this is fixed, then platform independency is also fixed.

vignettes

  • There is quite a bit of platform dependent code. Why wouldn't example-4 and others work on all platforms?

Generally, the examples that are platform dependent make use of automatically downloaded Unix tools for conversion between file formats that are not handled by Bioconductor (to the best of my knowledge), and these are mainly UCSC specific file formats. However, in that particular one, I got the error

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'con' in selecting a method for function 'import': invalid class “GZFile” object: undefined class for slot "resource" ("characterORconnection")

in R-devel for Windows, this is why I made it platform dependent for the time being at least.

Keep in mind that about half of the Bioconductor user base is on Windows. We strongly recommend that you provide functionality for all platforms. Otherwise, you could mark your package as unsupported on Windows.

See above.

  • Use R utilities like R.utils for functions such as gunzip rather than using system commands.

The only system command has to do with the file conversion process mentioned above, occuring when retrieving 3' UTR regions from UCSC databases.

We highly recommend that you use native R functions rather than system commands because these reduce any platform dependencies in your code and make your package more portable. It may also be the case that you do not need the entirety of the workflow but only the data at certain steps which you can save as small datasets in the package to demonstrate the functionality in the vignette.

  • You don't have to rely on full real-world annotations for the vignette, you can provide your own small dataset to demonstrate the functionality of the package.

I wanted to transform example-5 and example-6 like example-4 but didn't do because of the previous error (GZFile). Using gunzip from R.tools to manually unzip the file before importing, returns a similar error with the GFF2File class (in Windows).

I am not familiar with this issue on Windows. My suggestion again would be to host a small dataset in the package so that you don't have to manually unzip files and avoid system commands.

R

  • Remove the BiocManager::install code in loadBsGenome and let the user install the package themselves.

See above for a justification of this. My purpose is to provide some way of automation, especially for building a complete annotation database (see also example-8 from the vignette). If this is not a good behavior, please confirm as instructions on building a complete database should change.

I can see that a step-by-step of this can be helpful. If you'd like to demonstrate a complete annotation database build please submit that as a workflow package instead.

OK, thanks for looking at this. I removed this functionality and implemented a check to inform the user about required BSgenome packages. I will also consider the workflow case.

  • Avoid over-using switch and nesting switch calls by creating a lookup table for .checkNumArgs. The only difference in numeric and integer switches is the text "numeric" and "integer" in the error message. The look up table can be a list (roughly):
argBounds <- c(5, 7)
gt <- list(op = function(x) x <= argBounds[1], cls = class,
    desc = function(x) "greater than")
lapply(gt, function(f) f(3))
# $op
# [1] TRUE
#
# $cls
# [1] "numeric"

Done.

  • Avoid nested for loops where possible (R/external.R)
  • Do not use writeLines or print in package code. If you need to show a message, use message.

These are leftovers from some biomaRt functions I had to modify to handle httr timeouts. I corrected these. Generally, the R file external.R contains such functions, borrowed and slighlty modified from the biomaRt package, as they were either not exported, or not offering the parameters I had to change as arguments. Lines where message is meaningless (e.g. when a table must be displayed) have been commented out.

I don't think this is good practice. If you are providing improvements and bug fixes to biomaRt, you should do so via Pull Request to their repository (https://github.com/grimbough/biomaRt). I would only copy code over after all other options have been exhausted and after you have authorization from the package maintainer. As a maintainer, it is usually better to receive a pull request for an improvement rather than have your code copied over to someone else's package.

Thanks for the comments and suggestions. I did a bit more research on this and came up upon this by biomaRt author. So I implemented this solution which seems to work ok and the collated file external.R is not needed any more.

  • The functions you are writing look quite manual and laborious. Is there a way to interface with bioMaRt using the xml package or another package?

Are you referring to the functions in external.R or generally in the package? If the former, these are borrowed functions from biomaRt. See above on the justification for using them like this.

See above.

  • Using numeric indices to subset can be risky when the data change. We advise using names to subset data e.g., ann[, "score"] rather than ann[, 1] (from R/fromtxdb.R; and avoid repeating this code if possible)

Not every time that these indices are used to subset and reorder the columns, the corresponding columns have the same names (usually there is 1-2 names that change). So...

Perhaps add unit tests to keep the code robust in these situations.

Thanks! -Marcel

Finally, after following instructions here I tried to push the package also upstream, and received the following:

git push upstream main

Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 79, done.
Counting objects: 100% (79/79), done.
Delta compression using up to 16 threads
Compressing objects: 100% (56/56), done.
Writing objects: 100% (59/59), 13.05 KiB | 835.00 KiB/s, done.
Total 59 (delta 41), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
 ! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'

Thank you all for your help,

Panagiotis

lshep commented 3 years ago

Please remove any webhooks you have set up on your github repository. Please also remember to push to git.bioconductor.org to trigger build changes. It is important to practice this concept as this will be the way to update your package upon acceptance as well. Be sure to Activate your git credentials account if you have not done so already. And please view this git help page for pushing to git.bioconductor.org as upstream remote.

pmoulos commented 3 years ago

Hi Lori, @lshep

Thanks for the advice. However, I already removed the previous hook and tried everything regarding the new package workflow. My credentials account are active as I maintain other packages too. Specifically, I tried:

$ git pull

Already up to date.

git remote add upstream git@git.bioconductor.org:packages/sitadela.git
# OK

$ git remote -v

origin  https://github.com/pmoulos/sitadela.git (fetch)
origin  https://github.com/pmoulos/sitadela.git (push)
upstream        git@git.bioconductor.org:packages/sitadela.git (fetch)
upstream        git@git.bioconductor.org:packages/sitadela.git (push)

$ git fetch --all

Fetching origin
Fetching upstream
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
# OK

 * [new branch]      master     -> upstream/master

$ git merge upstream/master
Already up to date.

$ git merge origin/main
Already up to date.

There is nothing new to commit at this point but changed the date in DESCRITPION file so as to fully comply with the instructions. Fun begins...

$ git commit -a
[main 6bd44bc] Date in DESCRIPTION
 1 file changed, 1 insertion(+), 1 deletion(-)

$ git push upstream master
error: src refspec master does not match any
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'

$ git push upstream main
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 82, done.
Counting objects: 100% (82/82), done.
Delta compression using up to 16 threads
Compressing objects: 100% (59/59), done.
Writing objects: 100% (62/62), 13.31 KiB | 852.00 KiB/s, done.
Total 62 (delta 43), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
 ! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'

$ git push origin main
Username for 'https://github.com': pmoulos
Password for 'https://pmoulos@github.com':
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 16 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 325 bytes | 325.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/pmoulos/sitadela.git
   59596b2..6bd44bc  main -> main
# OK with origin in GitHub

So essentially I am unable to push in upstream. Is it because the branch name in origin is main while in upstream master?

$ git show-ref
6bd44bc42912f250d612e4306e2af26f89713034 refs/heads/main
6bd44bc42912f250d612e4306e2af26f89713034 refs/remotes/origin/HEAD
6bd44bc42912f250d612e4306e2af26f89713034 refs/remotes/origin/main
aa2fc53bfa78782c0b88cedb43dea3ef7071a9e9 refs/remotes/upstream/master

I even tried to add a remote (upstream) with a different name for the main branch (after removing the first one):

git remote add -m main upstream git@git.bioconductor.org:packages/sitadela.git

Still nothing... Any help would be greatly appreciated, thanks in advance.

Panagiotis

lshep commented 3 years ago

@nturaga could you assist with the issue with main/master and pushing to upstream master?

nturaga commented 3 years ago

Hi,

There is no branch called main in your upstream remote. It is only called master.

If git push upstream master isn't going through, it means your SSH keys aren't set up properly.

Please check http://bioconductor.org/developers/how-to/git/faq/

pmoulos commented 3 years ago

Hi @nturaga, thanks for the advice.

I have tried everything...

How can this be explained though?

$ git push upstream master
error: src refspec master does not match any
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'

Push process not even initiated... While

git push upstream main
Enter passphrase for key '/home/panos/.ssh/bioc_rsa':
Enumerating objects: 82, done.
Counting objects: 100% (82/82), done.
Delta compression using up to 16 threads
Compressing objects: 100% (59/59), done.
Writing objects: 100% (62/62), 13.31 KiB | 801.00 KiB/s, done.
Total 62 (delta 43), reused 0 (delta 0), pack-reused 0
remote: FATAL: W refs/heads/main packages/sitadela p.moulos DENIED by fallthru
remote: error: hook declined to update refs/heads/main
To git.bioconductor.org:packages/sitadela.git
 ! [remote rejected] main -> main (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/sitadela.git'

While this seems to initiate the procedure but fails...

Should I completely change SSH keys? Add more?

Thanks and apologies for inconvenience...

nturaga commented 3 years ago

Yes, try adding a new SSH key. This is not an issue on our side of things. It's on your end.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 6bd44bc42912f250d612e4306e2af26f89713034

pmoulos commented 3 years ago

Hi @nturaga , @lshep

Thank you for your help. Solved based on this.

$ git push upstream main:master

Apparently, not everything is clear with GitHub having changed master to main. Maybe you could consider adding this hint to the new package workflow.

Thanks again,

Panagiotis

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

That is a quick fix but what you should be doing is

git checkout -b master

and pushing to Bioconductor from there. We are not supporting pushes from the main branch yet though this may change in the future and we may require a differently named branch.

Best, Marcel

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

Do I have to re-implement method for importing GTF files when using BiocIO? I am sure that I am missing something here, apologies...

It looks like the method lives in rtracklayer so use it from there. AFAIU BiocIO exports the generic. Unless you're creating a new method, you don't have to import BiocIO.

Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12. Using R.utils for any compressed file preprocessing does not fix this.

For your package, the current release is not relevant because it will only exist in 3.13 and above. Are you saying it's not working on Bioconductor 3.13? Perhaps you can create an issue in the package where it fails? I'm also tagging @lawremi in case he'd like to comment on supporting GTF imports on Windows in Bioc devel.

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e9c368f241536e7bce6d61a27a5ca1f57b887d8d

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

pmoulos commented 3 years ago

Hi @LiNk-NY, Marcel,

Thanks for your comments.

Do I have to re-implement method for importing GTF files when using BiocIO? I am sure that I am missing something here, apologies...

OK, returned to rtracklayer.

It looks like the method lives in rtracklayer so use it from there. AFAIU BiocIO exports the generic. Unless you're creating a new method, you don't have to import BiocIO.

Regarding Windows, the GTF file import works in R 4.0.3 and Bioconductor 3.12. Using R.utils for any compressed file preprocessing does not fix this.

For your package, the current release is not relevant because it will only exist in 3.13 and above. Are you saying it's not working on Bioconductor 3.13? Perhaps you can create an issue in the package where it fails? I'm also tagging @lawremi in case he'd like to comment on supporting GTF imports on Windows in Bioc devel.

I updated rtracklayer dev version. It works now and all executed examples work on all platforms.

Thanks again.

Panagiotis

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

Thanks for making those changes. Please remove commented code from the vignette. We don't encourage eval=FALSE code chunks because the code can become stale quite quickly. Please update these and your package is ready for inclusion to Bioconductor. Thank you!

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 33a8b196e3296548db215ad779190672d72ec41c

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY

I removed the commented code. I included these examples in the README file in the package GitHub page instead. Regarding the eval=FALSE chunks, I left only two of these just for demonstration purposes:

If these are considered against good practice, I will remove them too.

Thanks for all your help.

Panagiotis

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Thanks for your submission Panagiotis @pmoulos. Your package has been accepted. Best regards, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos

It looks like the location of the DB file could be handled in a better way. Perhaps you can have an Environment Variable / R option that sets the permanent location for this DB and have the user accept the default location via a prompt. Perhaps you could use

> tools::R_user_dir("sitadela", "data")
[1] "/home/user/.local/share/R/sitadela"

as the location after prompting and getting confirmation from the user. The user should absolutely be aware that a database is being created in their files.

See here for an example on how to cache a file like this : https://github.com/Bioconductor/GenomicDataCommons/blob/master/R/caching.R and https://github.com/waldronlab/cBioPortalData/blob/f1db835193430accdec1902645366a57017c9231/R/cache.R#L86-L112

Update: One of our core team members pointed out the following: Why isn't loadAnnotation() using the same default as addAnnotation() for the db argument? All functions with the db argument should probably use the same default (e.g. getDbPath()).

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: cd7a4a44cad45ec4052cfceb39ac115f6fecb39c

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY

Thanks for pointing out these issues.

I corrected the issue with the db argument in loadAnnotation. Also, I implemented your suggested solutions regarding the location of the database and user confirmation. Briefly, the following logic is implemented through the following functions:

setDbPath <- function(db=NULL) {
    if (is.null(db))
        db <- .defaultDbPath()
    else {
        if (!is.character(db)) {
            warning("The path to the sitadela database must be a valid path ",
                "to a file which\nwill be created if not existing! Assuming ",
                "default...",immediate.=TRUE)
            db <- .defaultDbPath()
        }
    }

    if (is.null(getOption("BioC"))) {
        BioC <- list()
        class(BioC) <- "BioCOptions"
        options("BioC"=BioC)
    }

    BioC <- getOption("BioC")
    BioC$sitadela <- db
    options("BioC"=BioC)
}

getDbPath <- function() {
    db <- getOption("BioC")$sitadela
    if (!is.null(db) && is.character(db)) {
        if (file.exists(db))
            return(db)
        else
            return(.defaultDbPath())
    }
    else
        return(.defaultDbPath())
}

.defaultDbPath <- function() {
    # We end up here if option for sitadela database path does not exist or has
    # not been provided by the user
    defPath <- .getDefaultDbPath()

    # User conformation only if database path not provided explicitly by the
    # user. If provided explicitly, user knows a database is going to be created
    # at the user-specified path.
    if (!interactive())
        .checkDbPath(defPath)
    else {
        if (!dir.exists(dirname(defPath))) {
            confirm <- menu(c("Yes","No"),
                title=sprintf(paste0("The sitadela database is going to be ",
                    "created at %s. Do you agree?"),defPath))
            if (confirm == 1)
                .checkDbPath()
            else
                stop(paste0("The sitadela database cannot be created without user ",
                    "agreement.\nPlease agree or use the 'db' argument in the ",
                    "'addAnnotation' function."))
        }        
    }

    return(defPath)
}

.checkDbPath <- function(db=NULL) {
    if (missing(db) || is.null(db))
        db <- .getDefaultDbPath()
    dbd <- dirname(db)
    if (!dir.exists(dbd)) {
        message("sitadela database created at ",dbd," directory")
        dir.create(dbd,recursive=TRUE,mode="0755",showWarnings=FALSE)
    }
    else
        message("sitadela database found at ",dbd," directory")
}

.getDefaultDbPath <- function() {
    return(file.path(tools::R_user_dir("sitadela","data"),"annotation.sqlite"))
}

Finally, on package load (zzz.R):

.onLoad <- function(libname,pkgname) {
    setDbPath()
}

Please let me know of any further corrective actions.

Best regards,

Panagiotis

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY,

Is there any progress on reviewing the latest changes on the database path? Should I perform any further corrective actions?

Thanks in advance,

Panagiotis

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos Sorry for the delay. Please do not use a vague name for your option ("BioC"). Be more specific to your package, perhaps "sitadela_cache" or similar for the option name. Thanks!

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 21af9acc062101a031156469741350bdc129cbbc

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/sitadela to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

pmoulos commented 3 years ago

Hi Marcel, @LiNk-NY,

I changed the option name to sitadela_dbpath. Now, it looks like:

setDbPath <- function(db=NULL) {
    if (is.null(db))
        db <- .defaultDbPath()
    else {
        if (!is.character(db)) {
            warning("The path to the sitadela database must be a valid path ",
                "to a file which\nwill be created if not existing! Assuming ",
                "default...",immediate.=TRUE)
            db <- .defaultDbPath()
        }
    }
    options("sitadela_dbpath"=db)
}

getDbPath <- function() {
    db <- getOption("sitadela_dbpath")
    if (!is.null(db) && is.character(db)) {
        if (file.exists(db))
            return(db)
        else
            return(.defaultDbPath())
    }
    else
        return(.defaultDbPath())
}

Best,

Panagiotis

LiNk-NY commented 3 years ago

Hi Panagiotis, @pmoulos Thanks for making that change. It looks good. Best, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

vjcitn commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/pmoulos.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("sitadela"). The package 'landing page' will be created at

https://bioconductor.org/packages/sitadela

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.