nanotatoR: next generation structural variant annotation and classification

VilainLab commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/VilainLab/nanotatoR

Confirm the following by editing each check box to '[x]'

[x ] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

ab9172a Bumped version

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

VilainLab commented 5 years ago

Hi,

Currently my package has passed all tests, and there is one warning I am getting. For the functions which use the R package rentrez to extract entrez ids related to a given phenotype from NCBI and convert the gene IDs to gene symbols using biomaRt, is not passing the 5 minute processing time criterion. Is there a way (any other alternative) by which I can reduce this processing time? Thanks a lot for all the help and guidance in advance

Thanks, Surajit

mtmorgan commented 5 years ago

Bioconductor 'org' packages allow mapping between ids and symobls, e.g.,

ids = c("ENSG00000121410", "ENSG00000175899", "ENSG00000256069", "ENSG00000171428",
"ENSG00000156006", "ENSG00000196136")
mapIds(org.Hs.eg.db, ids, "SYMBOL", "ENSEMBL")

I'm not exactly sure what you mean by 'entrez ids related to a given phenotype'

VilainLab commented 5 years ago

Hi Martin,

Thanks a lot for your reply. Actually the function has two parts i) Extract entrez gene ids related to a user defined term (disease/phenotype) from NCBI databases like ClinVar, OMIM, etc, using rentrez. ii) Convert the gene ids to Gene symbol using BioMart.

Both these methods combined are increasing the processing time of the package, as they are fetching data from external database. Most time consuming part of the whole process is to get the ensembl object using useMart from biomaRt. As I am printing both ensembl ID and entrez Gene as an output, do you think I can use any other tool to do the same? I tried using org.Hs.eg.db, fetching gene symbols and ensembl ids seperately, but the time difference is marginal.

Thanks again for all your guidance and help.

Thanks, Surajit

From: Martin Morgan [notifications@github.com] Sent: Friday, April 05, 2019 8:17 AM To: Bioconductor/Contributions Cc: Bhattacharya, Surajit; Mention Subject: [EXT] Re: [Bioconductor/Contributions] nanotatoR: next generation structural variant annotation and classification (#913)

ATTENTION: External Email! Do not click attachments/links unless sender is known.

Bioconductor 'org' packages allow mapping between ids and symobls, e.g.,

ids = c("ENSG00000121410", "ENSG00000175899", "ENSG00000256069", "ENSG00000171428", "ENSG00000156006", "ENSG00000196136") mapIds(org.Hs.eg.db, ids, "SYMBOL", "ENSEMBL")

I'm not exactly sure what you mean by 'entrez ids related to a given phenotype'

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_Contributions_issues_913-23issuecomment-2D480254349&d=DwMFaQ&c=Zoipt4Nmcnjorr_6TBHi1A&r=A7ikyxJ6NjfZXZkHJSAEEDOxGh2fq1RXLJXrVyw-TI4&m=PH2z3KGO7NmhU_9Z0Umk2nEAu7delPcdOuCEz4HKgXQ&s=Ndhg5A6u2QUmThVGf5SVVzX6M3nI5XWXZgmbns3xS2A&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AlqeLih8nmIf6jLJoz2ptxfZGLljKueAks5vdz7UgaJpZM4XLHWQ&d=DwMFaQ&c=Zoipt4Nmcnjorr_6TBHi1A&r=A7ikyxJ6NjfZXZkHJSAEEDOxGh2fq1RXLJXrVyw-TI4&m=PH2z3KGO7NmhU_9Z0Umk2nEAu7delPcdOuCEz4HKgXQ&s=upXZYJSyQmXiEoyV5D-o3yHQqI6Sd0il8pzf2s89GWk&e=.

mtmorgan commented 5 years ago

I'm not really sure what you mean by the system time being marginal; mapIds() will generally return in under a second

> system.time(mapIds(org.Hs.eg.db, ids, "SYMBOL", "ENSEMBL"))
'select()' returned 1:1 mapping between keys and columns
   user  system elapsed
  0.110   0.001   0.111

whereas biomaRt is a web call and must be much slower than that?

I think the conversation would be easier if you were to provide some simple reproducible examples that I can copy and paste into my own R session.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

f5d2e46 Bumped version

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

VilainLab commented 5 years ago

Hello,

I could get rid of the error by using mapIds function. Thanks a lot for your advise regarding the same. somehow it was taking some time to run in my system, but when I restarted my computer, it ran properly. Thanks again for your advise.

Thanks, Surajit

VilainLab commented 5 years ago

Hello, Replies in line in bold inline.

Please let me know, if you have any other query.

Thanks again for all the help and advise.

Thanks, Surajit

Thank you for your submission to Bioconductor. Please see the following initial review.

General

This biggest concern is your package should be compatible with other existing Bioconductor methods and objects. You currently don't import or use any existing Bioconductor infastructure. Example: Why for reading bed files did you not use the recommended rtracklayer::import ? You should be able to take in output of standard Bioconductor classes to your functions (like GenomicRanges the output of rtracklayer::import and the recommended reading of BED files). We also would consider GSEABase for GeneSet and GeneSetCollections for gene results.: The bed file input functionality we have currently is reading bed files as text format and converting the X and Y chromosomes to 23 and 24 respectively. This is done in order for this to be compatible to the variant file created by the optical mapping bionano platform. The user has the choice to enter either a bionano compatible bed file or input a normal bed file that can be converted into bionano compatible bed file. From the next version of nanotatoR. we would use rtracklayer::import to import bed files. Other than this functionality we have used bioconductor functions where ever we got the chance, like using AnnotationDbi and org.Hs.eg.db, for converting entrez id to gene symbols Build Report

correct the note about coding practices regarding the use of 1:... Done runnable examples - you should at least show the code that could be run in nanotatoR_main.Rd. run_bionano_filter example doesn't call itself in the example code? We have shown the code, but kept it un-executable as it provides the output as a excel file which has Rtools dependencies. NEWS - see below NEWS

The reason the build report is not picking up your NEWS file is the bad file extension. Your NEWS file is also formatted incorrectly. The news file should either be NEWS or NEWS.md and should be formatted as directed in the ?news helper. Remove the title section and format accordingly. Formatted in the format required DESCRIPTION

remove lazyData: true Done Vignette

Change installation instruction to have Bioconductor installation. Done In your code chunks be consistent with argument settings. you defined smap and then system file with directly accessing for hgpath. Then you define some arguments to use in your function but then don't pass them as arguments to your function.Done why are you providing your own bed file. use existing in other packages or from the bioconductor annotation or experiment hubs. We would implement this in the next iterations Correct the following: 2: In if (returnMethod_bedcomp == "Text") { : the condition has length > 1 and only the first element will be used. Fix this conditional to account for length > 1. The results of running the code - datcomp is NULL Correct the following. You should not be using this argument if it is designated as deprecated. Warning in biomaRt::useMart("ensembl", host = "www.ensembl.org", ensemblRedirect = FALSE, : The argument "ensemblRedirect" has been deprecated and will be removed in the next biomaRt release.
Using org.Hs.eg.db instead run_bionano_filter the code given cannot be run and results in ERROR - fix this run_bionano_filter(SVFile=smappath,fileName,input_fmt_geneList="dataFrame", input_fmt_svMap="Text",RtoolsZIPpath="") Error in run_bionano_filter(SVFile = smappath, fileName, input_fmt_geneList = "dataFrame", : unused argument (RtoolsZIPpath = "") error running nanotatoR_main code chunk as well nanotatoR_main(smap, bed, inputfmtBed = c("BNBED"), n=3,mergedFiles , buildSVInternalDB=TRUE, soloPath, solopattern, input_fmt_INF=c("dataframe"),returnMethod_GeneList=c("dataframe"), returnMethod_bedcomp=c("dataframe"),returnMethod_DGV=c("dataframe"), returnMethod_Internal=c("dataframe"),input_fmt_DGV=c("dataframe"),
hgpath, smapName,method=c("Single"), term, thresh=5, input_fmt_geneList=c("dataframe"),input_fmt_svMap=c("dataframe"), svData,dat_geneList,outpath="",outputFilename="",RZIPpath="") Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input In addition: Warning message: In file(file, "rt") : file("") only supports open = "w+" and open = "w+b": using the former Took care of this errors Inst

Cannot open file in doc/ remove or update Removed it Document how you created or the source information for the files provided in inst/extdata in inst/scripts The data sets used are sample example data made from real bionano optical mapping datasets, which are downloaded from Genome In a bottle Consortium.Can you please advise us the best way to cite those? Are all thees files standard output from a particular type of experiment or instrumentation? nanotatoR is used to annotate Structural variant maps (smaps) which are produced by the optical mapping platforms from Bionano. This optical mapping tool are effective in better understanding and examining larger Structural variants. Although, the analysis software produced by bionano is effective in calling variants, but they do not provide any annotation. Our tool bridges this gap. Why not use already existing bed files provided in already accepted Bioconductor instead of providing new. This is always encouraged. Would add This functionality in the next iteration R code Have not done an indepth look into the R code until the section General is considered. There should be a re-write making the functions compatible and consistent with existing Bioconductor infastructure. See also: http://bioconductor.org/developers/how-to/commonMethodsAndClasses/

Please comment back here with changes and updates. Let me know when you would like me to re-review your package. Cheers

mtmorgan commented 5 years ago

Can you please edit your response so that it is easy to read? bold, for instance, doesn't appear in several places. thanks

VilainLab commented 5 years ago

Edited and and made all the comments bold.

Thanks, Surajit

mtmorgan commented 5 years ago

Please address this note for all simple uses of the form 1:n.

* NOTE: Avoid 1:...; use seq_len() or seq_along()
  Found in files:
    Bed_SV_Comp.r (line 370, column 41)
    Bed_SV_Comp.r (line 371, column 45)
    Bed_SV_Comp.r (line 372, column 43)
    Bed_SV_Comp.r (line 375, column 23)
    Bed_SV_Comp.r (line 391, column 37)
    Bed_SV_Comp.r (line 392, column 41)
    Bed_SV_Comp.r (line 393, column 39)
    Bed_SV_Comp.r (line 396, column 23)
    Bed_SV_Comp.r (line 416, column 37)
    Bed_SV_Comp.r (line 417, column 41)
    Bed_SV_Comp.r (line 418, column 39)
    Bed_SV_Comp.r (line 421, column 23)
    Bed_SV_Comp.r (line 443, column 33)
    Bed_SV_Comp.r (line 444, column 37)
    Bed_SV_Comp.r (line 445, column 35)
    Bed_SV_Comp.r (line 448, column 19)
    entrez_extract.r (line 53, column 37)
    entrez_extract.r (line 103, column 15)

Please use rtracklayer to import your bed files, rather than writing your own parser.
For this

Document how you created or the source information for the files provided in inst/extdata in inst/scripts The data sets used are sample example data made from real bionano optical mapping datasets, which are downloaded from Genome In a bottle Consortium.Can you please advise us the best way to cite those?"

Write a 'man' page describing the steps required to generate the data; maybe these are 'standard' outputs from some software, or perhaps the output generated when specific options are chosen, or perhaps you have had to process the data generated by the software. All of this should be documented on the appropriate man page.

Please in your response make it clear for me to see what changes you made. For instance, use bullet points or white space to clearly separate each response.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

2309527 Bumped Version

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

9f1ddbf Version Bumped

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

5bcfeec Version Bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

c3a6c0c Update DESCRIPTION

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

8fe1036 Version Bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

e33bb8b Deleted errors and version bump

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

VilainLab commented 5 years ago

Hi, Thanks for your response. Please find below my reply regarding the review.

All the instances of 1:n has been replace by seq(n) or seq_len(length(n)).
I have used rtracklayer import to read in data.
I have put a Rscript detailing the different files, where they were downloaded from, how they were processed and defining each column. its in the inst/script/ folder.

Please let me know, if you have any further issues/comments.

Thanks again for all your help.

Thanks, Surajit

mtmorgan commented 5 years ago

OK thanks. seq_len(length(x)) is usually just seq_along(x)

bioc-issue-bot commented 5 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 5 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/VilainLab.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("nanotatoR"). The package 'landing page' will be created at

https://bioconductor.org/packages/nanotatoR

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

nanotatoR: next generation structural variant annotation and classification #913