Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

ORFhunteR #1620

Closed rfctbio-bsu closed 3 years ago

rfctbio-bsu commented 4 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Hi @rfctbio-bsu

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ORFhunteR
Type: Package
biocViews: Technology, StatisticalMethod, Sequencing, RNASeq, Classification, FeatureExtraction
Title: Predict open reading frames in nucleotide sequences.
Version: 0.99.0
Author: Vasily V. Grinev, Mikalai M. Yatskou, Victor V. Skakun, Maryna Chepeleva
Maintainer: Vasily V. Grinev, <grinev_vv@bsu.by>
Description: Provides identification of open reading frames in RNA molecules based on 
  sequences vectorization and classification.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
StagedInstall: no
Imports: Rcpp (>= 1.0.3), Biostrings, BSgenome.Hsapiens.UCSC.hg38, 
  Peptides, data.table, stringr, randomForest, rtracklayer, xfun,
  stats, utils
LinkingTo: Rcpp
Suggests: knitr
VignetteBuilder:  knitr
RoxygenNote: 7.1.1.9000
bioc-issue-bot commented 4 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lawremi commented 4 years ago

I'm interested in this package, because we were recently playing around with finding cryptic ORFs in the human genome. I looked for the vignette but found only a simple example, without any context. From looking at the code, it seems to only support finding a ORFs for a single gene at a time, so I guess it's not applicable to our use case. It also likes to take files instead of objects as input to its functions, so it would be inconvenient to use. It depends on Biostrings, but only internally. Lots of opportunity for improvement here.

bioc-issue-bot commented 4 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: be7a705d96d9f4e9efa681484c689897d66248e2

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 4 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 49c46e5511e4dabcc43a37357d20ac5662811627

rfctbio-bsu commented 4 years ago

Hi, @lawremi Thank you for your comment.

I'm interested in this package, because we were recently playing around with finding cryptic ORFs in the human genome.

The ORFhunteR package integrates a novel algorithm for automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It outperforms other existing ORF determination algorithms and computational tools at least by: 1) high precision of the estimated ORF of the human nuclear mRNA (estimated to be around 98.14% as concluded from the analysis of more than 170 000 molecules loaded from the NCBI RefSeq and Ensembl databases.), enabled by usage of an advanced machine learning model based on vectorization of nucleotide sequences and random forest classification algorithms; 2) it provides a very fast automatic determination of ORFs in a large set of up to 200 000 RNA molecules, due to using C++ codes; 3) it has a high level of universality and thus potential for spreading, since it additionally includes helpful functions for filtering mRNA and annotation of ORFs.

I looked for the vignette but found only a simple example, without any context.

The vignette is updated. It now contains the user’s manual for practicing the package functions. The usage of the package functions for automatic determination and annotation of ORFs is shown for an example set of 10 RNA molecules loaded from the Ensembl.

From looking at the code, it seems to only support finding a ORFs for a single gene at a time, so I guess it's not applicable to our use case.

It provides a very fast automatic determination of ORFs in a large set of up to 200 000 RNA molecules in parallel. The example of the ORF identification and annotation for 10 RNA molecules is added into the updated vignette.

It also likes to take files instead of objects as input to its functions, so it would be inconvenient to use.

At present, the package functions use the data files in various formats with an aim to simplify and optimize user’s proceedings and efforts, when it is needed during independent procedures of ORF determination and annotation steps. All steps of data analysis are obligatory in case when objects are used as an input to its functions. In forthcoming updates, the object-based inputs to the functions will be added.

It depends on Biostrings, but only internally.

The ORFhunteR package depends on the following packages: BSgenome.Hsapiens.UCSC.hg38, Biostrings, Peptides, data.table, randomForest, rtracklayer, stringr, xfun, Rcpp, stats, utils. These packages are automatically settling during the installation of ORFhunteR.

Lots of opportunity for improvement here.

The authors would like to thank the Reviewer for his comments, and will be happy to incorporate any further recommendation that would increase the quality or the package!

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 4 years ago

Hi Vasily, @rfctbio-bsu

Thank you for your submission. Please see the review below. Comment here if you have any questions.

Best, Marcel


ORFhunteR #1620

DESCRIPTION

NAMESPACE

vignettes

/ package folder

R

Minor:

rfctbio-bsu commented 3 years ago

Dear @LiNk-NY, we a sorry for the long answer.

After careful revision and serious improvement, we wish to resubmit our R-package ORFhunteR to be published at the project Bioconductor.

Should we open a new issue or it's possible to reopen #1620?

We declare the full revision of the review’s remarks commented on September 25, 2020.

We considerably improved, extended and updated the package ORFhunteR, for example, some added features and functionalities are: i) a probability estimate of a RNA molecule to be coding; ii) a function for development of a user’s classification model to be applied for predicting the open reading frames; iii) acceleration and parallelization of some functions. The full list of modifications is below to this message.

Finally, we launched the online free-of-charge service ORFhunteR (http://orfhunter.bsu.by), that is developed for a free public use of scientific, academic and amateur bioinformatics communities. The corresponding paper “ORFhunteR : An Advanced Approach for the Automatic Identification and Annotation of the Open Reading Frames In Human mRNA Molecules” is processed to be soon available at http://arxive.org.

Sincerely Yours, Vasily Grinev

ORFhunteR: improvements, extensions and updates (placed along the reading line of “The ORFhunteR package: User’s manual”).

  1. The detailed instruction manual “The ORFhunteR package”: User’s manual on using the package functions.
  2. The function findORFs. It is revised, short ORF candidates are omitted. It nearly twice reduced the computing time.
  3. The function vectorizeORFs. Some functions are substituted by those published in the Bioconductor’s packages. The time performance is about 20% higher as compared with the original version.
  4. The function classifyORFsCandidates is added into the package. It builds a randomForest classifier for the ORF candidates based on the original data used by authors (these data can be loaded from SST Center server at www.sstcenter.com/download/ORFhunteR ). The major aim of this function is that a user is able to develop a specific classification model based on personal or concrete data.
  5. The function predictORF. It is modified for parallel computing. It lets significantly reducing the computation time by about the number of the processing cores on large datasets, although should be applied soberly while used for analyzing small data sets.
  6. The function predictORF. The probability of a RNA molecule to be coding is added. The function selects only one ORF candidate per RNA molecule that was assigned with maximal value of the prob field. In fact, 91.9% of ORFs that were identified in mRNA molecules (Ensembl release 97, GRCh38.p12 human reference genome assembly) demonstrates significant probability.
LiNk-NY commented 3 years ago

Hi @rfctbio-bsu

The issue is still active and you can make commits to the git repository. If you'd like to continue the review, please respond to the review line-by-line.

Best regards, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 259b5c359b390b30d5f9fa19f060a4ba856cecfd

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3ee0443a5171cafca12f3ac56df30d864cd2dca2

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 025d891fd68b0412472adddda44cc480274d970a

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 57472fa161c259b2ebdcb6c2045ff1f1dcf38ea7

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

rfctbio-bsu commented 3 years ago

Hi Marcel! @LiNk-NY

Thank you! We made some changes to pass Checks with our new version. The line-by-line answer to your review is below.

Best, Vasily

DESCRIPTION

  • Please include a BugReports field.

Done.

  • (optional) Include a ORCID ID as a comment in Authors@R

Done.

  • Minor: format the code so that it is within the 80 column width limit

Done.

vignettes

  • Use eval = FALSE for chunks of R code that are not meant to run (e.g., installation instructions)

Solved.

  • Try to avoid cryptic argument names such as f.orf. The unwritten convention is to use . in arguments that require logical inputs, otherwise use camel case.

The names of arguments were changed.

/ package folder

  • It looks like there is an invalid file name .onUnload.R

Fixed.

R

  • Avoid using cat function for printing to the screen. Use message with a verbose argument in the function to reduce the noise.

Fixed. We use cat in R/classifyORFsCandidates.r.

  • Due to the size, move the BSgenome package to the Suggests field.

We think that the end user of our package really needs the BSgenome package in the import field.

  • Functions that use the package should check that the package is installed otherwise provide an error for the user.

There are no functions used Suggested packages, all dependencies must be installed.

  • Although it's best to avoid writing files for the user, d.work should point to a temporary directory by default.

Done.

  • Use a caching mechanism such as BiocFileCache for downloading online resources.

We set up downloading the model file once.

  • Reduce repetitive code using pseudocode: df[] <- vapply(df, as.numeric, numeric(1L))

Solved.

  • Avoid writing the results to a file, allow the user to write their own results (in finderORFs).

Done.

  • Where possible, use named columns instead of column indices (e.g., in R/finderPTCs.r). This ensures that your code is more robust to changes.

In current version of software, we used column names instead of column indices in R/finderPTCs.r, R/annotateORFs.r, R/findORFs.r, R/predictORFs.r and R/translateORFs.r.

Minor:

  • Use function names as verbs (e.g., findORFs instead of finderORFs; though this is a matter of style)

The functions were renamed.

  • Use arrow <- for assignment rather than =.

Fixed.

  • Remove commented code, it is confusing and probably stale anyway

Fixed.

LiNk-NY commented 3 years ago

Hi Vasily, @rfctbio-bsu

Thanks for making those changes and responding to the review. Your package is almost ready for acceptance. Please address the following points:

It is not clear why you are using cat instead of message in R/classifyORFsCandidates.R. Keep in mind that suppressMessages works on message but not on cat. We encourage the use of cat only in show methods for a package-owned class.

Also, please update the package so that this NOTE is taken care of:

* checking R code for possible problems ... NOTE
predictORF: no visible binding for global variable ‘prob’
predictORF: no visible binding for global variable ‘transcript_id’
Undefined global functions or variables:
  prob transcript_id

Remove the StagedInstall: no unless you have a valid reason for using it. I was able to install the package just fine without the flag in the DESCRIPTION file.

Address the warning from R CMD check:

* checking whether package 'ORFhunteR' can be installed ... WARNING
Found the following significant warnings:
  Warning: multiple methods tables found for 'export'

This is likely due to the multiple Imports/Depends. Please review the package guidelines and make sure that you are putting the packages in the correct fields. http://bioconductor.org/developers/package-guidelines/#description

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 35204248ac2ad8c903faef8be67a10f5ba1ed2d0

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 77accbde7364c84ac0e0a3f6cd24afc1e03c6723

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 7b7cde82e12094d63b397042cd2b32cfff252206

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: aaf43019e308f284850779c7415a6ee49952ad0c

rfctbio-bsu commented 3 years ago

Dear Marcel @LiNk-NY

We have reviewed all your comments and made corrections (detailed answer below). But there is a problem, the build report to a valid push does not return. We have checked http://staging.bioconductor.org:8000/ as written here. The build history is empty. Perhaps you know what the problem is? We also found two more packages (CelliD, fobitools) that do not get build history after 2020-01-16.

Detailed answer:

Please address the following points:

It is not clear why you are using cat instead of message in R/classifyORFsCandidates.R. Keep in mind that suppressMessages works on message but not on cat. We encourage the use of cat only in show methods for a package-owned class.

Ok, we use message now.

Also, please update the package so that this NOTE is taken care of:

* checking R code for possible problems ... NOTE
predictORF: no visible binding for global variable ‘prob’
predictORF: no visible binding for global variable ‘transcript_id’
Undefined global functions or variables:
  prob transcript_id

It identifies the variables prob and transcript_id as global although they are the columns of the object prob_orfs and are used in the data manipulation procedures on the R objects of the type data.table (for conditional filtering; see ref. to the package data.table). It is indeed a fast and robust way of data processing. I guess the Checker is not aware of this. If you have no serious argument(s) against we thus prefer to keep it not-updated.

Remove the StagedInstall: no unless you have a valid reason for using it. I was able to install the package just fine without the flag in the DESCRIPTION file.

Removed.

Address the warning from R CMD check:

* checking whether package 'ORFhunteR' can be installed ... WARNING
Found the following significant warnings:
  Warning: multiple methods tables found for 'export'

We removed redundant import of one of the functions.

This is likely due to the multiple Imports/Depends. Please review the package guidelines and make sure that you are putting the packages in the correct fields. http://bioconductor.org/developers/package-guidelines/#description

We have revised the rules and we believe that we correctly import the necessary packages for work.

Kind regards, Vasily

lshep commented 3 years ago

There was an issue with the builder. It has been corrected and I manually pushed builds. All packages that failed to receive builds should receive them shortly

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

rfctbio-bsu commented 3 years ago

Thank you @lshep! We got build with no errors and warnings but the label were not changed automatically. Should we push the fake bump version to cause rebulding?

lshep commented 3 years ago

I'll investigate more into why the label didn't remove. In the meantime I will remove it manually. Sorry for the inconvenience

rfctbio-bsu commented 3 years ago

Thank you!

LiNk-NY commented 3 years ago

Hi Vasily, @rfctbio-bsu

I am not seeing the changes you made reflected in R CMD check:

* checking whether package 'ORFhunteR' can be installed ... WARNING
Found the following significant warnings:
  Warning: multiple methods tables found for 'export'
* checking R code for possible problems ... NOTE
predictORF: no visible binding for global variable ‘prob’
predictORF: no visible binding for global variable ‘transcript_id’
Undefined global functions or variables:
  prob transcript_id

The NOTE can be handled by using the utils::globalVariables function in the pertinent R file.

Update: I'm not sure why I'm getting the warning and why it doesn't appear on the builders. My devel installation of Bioconductor is valid(). :thinking: I guess you can ignore the warning for now...

Thanks!

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 645707b4dcd2a531f3b75b6ee80917dd79f42de4

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/ORFhunteR to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

rfctbio-bsu commented 3 years ago

Hi Marcel @LiNk-NY

We fixed this Note.

Kind regards, Vasily

LiNk-NY commented 3 years ago

Hi Vasily, @rfctbio-bsu

Thank you for making those changes and for your contribution to Bioconductor. I have accepted your package.

Best regards, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

rfctbio-bsu commented 3 years ago

Hi @LiNk-NY Marcel,

Great news! Thank you very much for the review of our package!

Best, Vasily

mtmorgan commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/rfctbio-bsu.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("ORFhunteR"). The package 'landing page' will be created at

https://bioconductor.org/packages/ORFhunteR

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.