Closed klarakaleb closed 6 years ago
Hi @klarakaleb
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: roma
Title: R wrapper for the OMA REST API
Version: 0.99.0
Author: Klara Kaleb
Maintainer: Klara Kaleb <klarakaleb@outlook.com>
Description: A package for the orthology prediction data download from OMA database.
Depends: R (>= 3.3.2), jsonlite (>= 1.5), httr (>= 1.2.1),plyr(>= 1.8.4), utils
Imports: jsonlite, httr, plyr
License: GPL-2
LazyData: true
RoxygenNote: 6.0.1.9000
Suggests: knitr,
rmarkdown
VignetteBuilder: knitr
biocViews: Software, Comparative Genomics, Functional Genomics, Genetics, Annotation, GO
Your package has been approved for building. Your package is now submitted to our queue.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20170922150417.html
Received a valid push; starting a build. Commits are:
8c34c9e third commit
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20170922152348.html
There needs to be a layer that integrates this information somehow with Bioconductor. The vignette just documents the functions; it needs to provide a tutorial/workflow --- how would someone use this package in the broader context? This should not have low-level packages like jsonlite listed in Depends. Just keep them in Imports.
Thank you for your submission please see the following comments:
[ ] Michael's comment above is valid and should be addressed. Until there is a more thorough vignette, it is hard to evaluate your package. There should be some integration with Bioconductor, like the output being converted into a standard Bioconductor object to be used with other existing pipelines.
[ ] also fix all ERROR, WARNING, and NOTES from the build report.
[ ] When I tried to clone your repo locally I also ran into the following ERROR's trying to build:
R CMD build roma/
checking for file 'roma/DESCRIPTION' ... OK
preparing 'roma':
checking DESCRIPTION meta-information ... OK
installing the package to build vignettes
creating vignettes ... ERROR Loading required package: jsonlite Loading required package: httr Loading required package: plyr Quitting from lines 22-27 (my-vignette.Rmd) Error: processing vignette 'my-vignette.Rmd' failed with diagnostics: 'fromJSON' is not an exported object from 'namespace:httr' Execution halted
After I fixed that locally:
R CMD build roma/
checking for file 'roma/DESCRIPTION' ... OK
preparing 'roma':
checking DESCRIPTION meta-information ... OK
installing the package to build vignettes
creating vignettes ... ERROR Loading required package: jsonlite Loading required package: httr Loading required package: plyr No encoding supplied: defaulting to UTF-8. Quitting from lines 22-27 (my-vignette.Rmd) Error: processing vignette 'my-vignette.Rmd' failed with diagnostics: lexical error: invalid char in json text.
[ ] If I skip building locally and just install the later
lexical error: invalid char in json text.
<html> <head><title>500 Intern
(right here) ------^
occurs when trying to run the code in your vignette.
[ ] There are also a number of .Rapp.history
files. Please remove.
R code Based on a quick scan of code
source("R/config.R")
trycatch
when accessing data. Please address the above issues and do a version bump to regenerate a build report. Please also comment back here when the changes have been made and you are ready for a second review. Cheers Lori
Thank you both for your valuable feedback! I will address all the comments and comment back here once it is ready for the second review.
Best, Klara
Hi @klarakaleb, Just wanted to point out that the deadline for packages to be included in the next release is in a week Tue Oct 17th. After that accepted packages will be available in devel only until the next release. Just wanted to make you aware of this deadline for pushing updates.
Hi @lshep , thank you for letting me know. We have been having some problems with our server hence the delay (and vignette building problems). That has been sorted today and I will be making a push update that addresses your comments by tomorrow.
Received a valid push; starting a build. Commits are:
934ffbf changes according to the first review
Hi @lshep and @lawremi , apologies for the delay. I have made the updates for the packages and look forward to your feedback.
1) The package passed R CMD check, R CMD build and R CMD BiocCheck without errors/warnings on my local mac machine. 2) I have included better error handling. 3) I have added additional functionalities due to the expansion of the web resource. 4) I have integrated the data with some existing Bioconductor packages, namely topGO, Biostrings and ggtree.
I look forward to hearing from you,
Klara
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171013084158.html
Received a valid push; starting a build. Commits are:
89f6703 build report fix
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171013113353.html
Thank you for the updates. I think in general this would be a good addition to Bioconductor as I am not aware of other packages that currently access the data from OMA. There are a number of concerns and suggestions that need either clarification or to be addressed:
REPORT:
[ ] There are still a number of ERROR, WARNING and NOTES on build report.
[ ] I'm not sure why there is a security warning on tokay1? Do you have any insight if there are credentials or anything to access the database?
DESCRIPTION:
NAMESPACE:
INST/
UNIT TESTS
VIGNETTES:
MY-VIGNETTE
[ ] Call library(roma) once at the top of the vignette not in each code chunk
[ ] Your first example probably should output real results not NULL
> library(roma)
>
> results <- getXref(pattern="MAL")
>
> results
NULL
[ ] getData - is an S3 object "OMAObject" - access with $ - is this object explained somehwere? how would they know what is available?
[ ] why not S4 class with better defined accessors?
[ ] typo - resolveURL "such entries in split into a" - "is split"
[ ] is resolve url getting direct retrieval? can these be formatted better than a data.frame?
[ ] Your formatData example doesn't work
> formatted_orthologs = formatData(orthologs)
> formatted_orthologs
NULL
EXPLORING_HOGS:
[ ] Your code seems incorrect if length of parent is 1 but parent is actually NULL
[ ] Seems like there should be accessor methods if you think there will be information the user will commonly want - not encouraged to strictly use [[]] and $ by users
> hog$hog_id
[1] "HOG:0261495.1a"
>
> length(hog$parent_hogs)
[1] 1
>
> hog$parent_hogs[[1]]$parent_hogs
NULL
>
> length(hog$children_hogs)
[1] 2
>
> hog$children_hogs[[1]]$hog_id
[1] "HOG:0261495.1a.1b"
>
> hog$children_hogs[[2]]$hog_id
[1] "HOG:0261495.1a.1a"
>
> roma::resolveURL(hog$children_hogs[[1]]$levels_url)$level
[1] "Neognathae"
>
[ ] Again only call library(roma) once in the vignette
[ ] You repeated code from the first code chunk in the second - don't it stays in memory and is accessible from chunk to chunk inside a single vignette
[ ] The second code chunk seems like it should be wrapped up as a function in your package as a convenient wrapper to get this information
[ ] vapply/lapply/sapply/apply is recommended over for loops
SEQUENCE_MAPPING
TREE_VISUALISATION
ALL
MAN:
[ ] have a man page for your package - give brief overview of what package does and point to main functions and vignettes
[ ] There is no documentaiton for your object - although it is recommended not to define a class
[ ] formatTOPGO - value - confusing - its a list either way - wording seems like there should be an object GO2geneID or geneID2GO
[ ] getAnnotation - value is a data.frame - there are many of your functions defined as something like an object containing the JSON keys as attributes - this is not correct - correct all man files with this to be an accurate description of the object returned. - if its a data.frame its a data.frame containing ??what??
R CODE:
formatData.R
getAnnotation.R
getBiocStrings.R
getData.R
getGenomeAlignment.R
getHOG.R
getTaxonomy.R
getTopGO.R formatTopGO:
getTree.R
ape::read.tree(text=taxonomy$newick)
utils.R
simpleRequest
objectFactory
requestFactory
If the GET is moved into the trycatch of the request functions then there is no need for check_response separate function - check the response when you download the file - you don't want to GET the file twice
General: sometimes you do the check_response url and sometimes you don't in your main functions be consistent - althought based on the comment that both the check and the requestFactory call GET - the placement and use of this function should be reconsidered.
Please update the code and response back to the inquiries presented. I think this would be a great addition to Bioconductor. Look forward to working with you further on getting it accepted. ~Lori
Received a valid push; starting a build. Commits are:
283fc02 roma 0.99.5
Thank you for the extensive comments. I have addressed them updated accordingly in the newest push.
REPORT There should be no errors/issues/warnings now (awaiting the build report on linux/windonws). There is no key needed for the current accession to the API so I am very puzzled as to why tokay1 would be throwing an error - I did some research and it revealed that sometimes it happens on Windows 10 due to length of the public key but I am really not sure how this is relevant. Hopefully the error clears now.
DESCRIPTION ggtree added to suggests and httr/plyr removed from imports.
NAMESPACE imports added to namespace. build report suggests adding importFrom(methods,new) but when I do an error is thrown in the build report?
INST NEWS file added with a first entry.
UNIT TESTS added using testthat.
VIGENTTES empty directory removed and main vignette renamed.
My VIGNETTE NULL error fixed (was due to a typo). The main reason for choosing S3 over S4 was its flexibility - there is so much data available over the OMA REST API that we thought specifying different objects for each datatype/endpoint would be cumbersome and so we have opted for automatic S3 object generation from the JSON on the fly. However, I gather that this is not standard to Bioconductor and that S3 objects are less well defined. To facilitate better documentation, in the latest push I have included a function getObjectAttributes() that allows the user to print the attributes available for an object as well as their respective data types.
I am unsure of a better data type to store a list of items than a dataframe. I have mainly chosen it for its ease of accession and good data visualisation.
EXPLORING HOGS I fixed the errors in the code. Furthermore, I have added a set of function that deals with dataframe filled with members information in membersDf.R . Users can now do the analysis I have done much quicker and simpler - they can also easily obtained GRanges for a members df and also their sequences (in AAString format).
TREE VIS fixed.
SEQ MAPPING fixed.
ALL To facilitate the incorporation into the Bioconductor, locus information is now directly getting returned in the GRanges format, cDNA as DNAString and sequence as AAString for single entries.
MAN Have incorporated this as an Rd file linking to the Rd files for functions and vignettes. The formatting is not good which is something I have tried to change but having been having major issues with roxygen2/devtools::document() i.e. it documented the package once and failed to work later on for reasons unknown. This is the main reason for the delay - I will continue working on it to fix it but wanted your feedback on other features in the meantime.
R CODE formatData() is now incorporated in object creation (moved it to utils.R, no longer exported) and incorporated seq_along()
GET is now withing the tryCatch body and so I have deleted check_response
Root can stay null - it only provides an option to root a tree if needed.
BiocStrings issue fixed.
validity checking added to getData and getHOG.
Have removed for loops in favour of apply where possible.
column_names = names(content_list) hence the lack of a check.
Thank you again for your review and I look forward to your feedback on the updates.
Klara
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171017033304.html
Thank you for addressing many of the issues. A few more comments:
VIGNETTE
exploring_hogs
child_hog_1_members = getHOG(id = child_hog_id_1, members=TRUE)
child_hog_2_members = getHOG(id = child_hog_id_2, members=TRUE)
# We can now directly use the above data to construct a topGO object for further analysis (such as GO enrichment)
annotations_1 = getOntologies(child_hog_1_members$members)
annotations_2 = getOntologies(child_hog_2_members$members)
annotations = c(annotations_1,annotations_2)
topGO = getTopGO(annotations, myInterestingGenes = names(annotations_1), format = "geneID2GO")
[ ] This seems like it would be a good function to give to the user - taking in which child to use and returning the results from topGO - convert to a function if you think it will be used a lot by users
[ ] General - but noticed in exploring_hogs - instead of multiple $ and [[ consider cleaner suggested retrieval of information
eg. parent_hog_id = parent_hogs$hog_id[[1]]
can be written as
parent_hogs[["hog_id"]]
This should be cleaned up in all R code as well
MAN revisit man pages as you have mentioned
R CODE
getHog
getTopGo
[ ] Have validity check for format right away
[ ] protein[["ontology"]] instead of protein['ontology'][[1]] etc
[ ] It is more efficient to preallocate and use apply/sapply/vapply: see efficient coding The top portion can be rewritten like:
geneID2GO = sapply(geneList, FUN = function(protein){
if(startsWith(protein[["ontology"]], "https://")){
unlist(as.list(resolveURL(protein[["ontology"]])["GO_term"]))
}else{
unlist(as.list(protein[["ontology"]]["GO_term"]))
}
})
names(geneID2GO) = sapply(geneList, FUN=function(protein){protein$omaid})
[ ] see if the above concept can be applied in other Rcode
[ ] getTree - take in text or your results from getTaxonomy as well as character text esp since you getTaxomonmy always returns netwick result
[ ] mapSequence - should be able to take in Biostring as query too
[ ] membersDF - good addition!
[ ] romaObject - good addition!
Thanks for all the hard work and updates! Please address the above issues and comment back here when you are ready for another review.
Received a valid push; starting a build. Commits are:
86ff09e roma 0.99.6
Thank you for your additional comments. I have addressed them updated accordingly in the newest push.
Furthermore, getTree now also handles a taxonomy object and mapSequence now also accepts a BioString object.
Thank you for your time and I look forward to your feedback on the updates.
Klara
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171020191452.html
Dear Lori (@lshep),
I have tried to find the source of the build error on tokay1 but have been quite unsuccessful - the package builds on my windows machine without any warnings/errors.
Would you happen to know what causes the error?
Many thanks,
Klara
Thank you for the heads up. I will look into this ERROR on our side of things to see if I can give you any more detail about where it is coming from. I will review your package again soon and provide any additional feedback on the current updates.
Dear @lshep , I was wondering whether you had any time to review my package? Best regards, Klara
I'm sorry for the delay. I will have another look at the package and respond soon.
@klarakaleb I again apologize for the delay; I got tied up with the Bioc release.
I think this is very close to acceptance and with the following minor adjustments should be okay.
MAN
[ ] getGRanges.Rd/getOntologies.Rd/getSequences.Rd
change example to not use $:
getGRanges(df = getData("group","YEAST58")["members"])
getOntologies(df = getData("group","YEAST58")["members"])
etc...
[ ] getAnnotation.Rd: make mention that query argument can be Biostring object or character.
[ ] getHog.Rd level argument should be more specific - should it be an integer, a character,
R
[ ] getHog.R The class(level) check should be outside the if/else with the other checks - I would check the logic of this if/else - if(!is.null(level)) so the else is only if the level was not specified and wouldn't need a check.
[ ] mapSequence.R should be able to take a Biostring in as argument to query; it doesn't seem like this is the case nor is it in the man file that you could use Biostring or character.
Thank you for your continued effort!
Received a valid push; starting a build. Commits are:
9cdc0e1 roma 0.99.7
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171115061444.html
Received a valid push; starting a build. Commits are:
cd5291a roma 0.99.8
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the following build report for more details:
http://bioconductor.org/spb_reports/roma_buildreport_20171115065750.html
Hi @lshep ,
Thank you for your comments. I have just fixed the issues brought up in your previous comment and have fixed the minor warnings brought up by my first push. Kindly let me know if anything else is to be done!
Best,
Klara
Thanks for all your hard work!
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be sent to the maintainer email address in the next several days.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/klarakaleb.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"roma\")
. The package 'landing page' will be created at
https://bioconductor.org/packages/roma
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Repository: https://github.com/klarakaleb/roma
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor [Package Submission][2] instructions. My package is consistent with the Bioconductor [Package Guidelines][1].
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the [support site][3] for issues that users may have, subscribing to the [bioc-devel][4] mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including: