Closed RMTbioinfo closed 5 years ago
Hi @RMTbioinfo
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: ExCluster
Type: Package
Title: ExCluster robustly detects differentially expressed exons between two
conditions of RNA-seq data, requiring at least two independent biological
replicates per condition.
Version: 0.99.0
Author: 2018-06-01
Author: R. Matthew Tanner, William L. Stanford, and Theodore J. Perkins
Maintainer: R. Matthew Tanner <rtann038@uottawa.ca>
Description: ExCluster flattens Ensembl and GENCODE GTF files into GFF files,
which are used to count reads per non-overlapping exon bin from BAM files.
This read counting is done using the function featureCounts from the package
Rsubread. Library sizes are normalized across all biological replicates, and
ExCluster then compares two different conditions to detect signifcantly
differentially spliced genes. This process requires at least two independent
biological repliates per condition, and ExCluster accepts only exactly two
conditions at a time. ExCluster ultimately produces false discovery rates
(FDRs) per gene, which are used to detect significance. Exon log2 fold
change (log2FC) means and variances may be plotted for each significantly
differentially spliced gene, which helps scientists develop hypothesis and
target differential splicing events for RT-qPCR validation in the wet lab.
biocViews: DifferentialSplicing, RNASeq, Software
VignetteBuilder: knitr
Suggests: knitr
Depends: Rsubread
Imports: Rsubread
License: GPL-3
Encoding: UTF-8
LazyData: true
Before adding your package to the review queue, the vignette needs to be modified to have runnable examples --- <<>>=
code chunks. Likely this requires 'real' data, which might be provided in an (very) abbreviated form in your package, from an existing ExperimentData package, from a resource downloaded from the web (using, e.g., BiocFileCache so that the download cost is not paid on each nightly build), or from creation of an ExperimentHub data resource.
Also, please consider whether there are opportunities to re-use standard Bioconductor packages, especially GenomicRanges, SummarizedExperiment, and rtracklayer, to represent and manipulate the inputs and products of your workflow in a robust and interoperable way.
Please also remove files that are not needed to build your package, e.g., the vignettes/ directory should contain only your .Rnw file.
Please make these initial revisions and then post a brief comment here summarizing the changes you have made.
Committed & Pushed new vignette code to Git Hub. The ExCluster 0.99.0 package now contains code chunks in the vignette which can be executed directly from the package.
A reviewer has been assigned to your package Learn what to expect during the review process.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Committed changes to ExCluster package to remove extraneous files which were causing an error in the build:
ERROR: System Files found that should not be git tracked: ExCluster.Rproj ._.DS_Store .DS_Store ExCluster.log
Please make sure the webhook is established so that a version bump will trigger a new build report. A build will only be triggered than with a version bump in the DESCRIPTION file (i.e 0.99.0 to 0.99.1 )
Received a valid push; starting a build. Commits are:
a463c13 Changed version to 0.99.1 After deleting extraneo...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Not sure how to proceed with 2 issues:
1) TIMEOUT: R CMD BiocCheck exceeded 0mins on both Mac and Linux. How is this fixed? Does BiocCheck really have to run in < 1 minute?
2) malbec1 BUILD BIN was skipped. Is this a problem that needs addressing? The output section for this was empty.
No BiocCheck has 10 minutes to run - that output is a typo that I will fix. malbec1 BUILD BIN is skipped for this platform and nothing needs to be addressed.
Ok, thanks so much!
And the windows version not building is fine -- this package shouldn't be made available for Windows. Running the processCounts function is an essential part of the package, and it requires the Rsubread package -- which is only available on Mac & Linux operating systems.
The TIMEOUT is occurring because of an issue with your vignette. If you run R CMD BiocCheck ExCluster you will see that a pop up window occurs, it waits for a reply which never happens as the checks are not run interactively. The issue stems from using Sweave specific options but you are trying to process with knitr. There are a few options you could take to remedy this:
We really would recommend finding other solutions than using Rsubread that would be compatible with all platforms. You are limiting the scope of your package and preventing a large group of users access to your package.
Ahh, you're right.
1) How do I process a vignette with Sweave only? I can uninstall knitr and the vignette builds just fine in R Studio. However, when I run BiocCheck, it keeps asking for the knitr package. How do I tell BiocCheck that the package Vignette should be processed with Sweave only.
2) Read counting in general seems problematic in Windows. The other obvious option is BEDtools which I was using previously, however BEDtools will not run natively in Windows. Every solution I have found which correctly counts exonic reads is either 1) extremely slow (1 hour + per sample) or 2) is not compatible with windows.
.BBSoptions
file to the top level directory of your package and include the line UnsupportedPlatforms: win
Once the package builds with a clean build report I will review the package further.
I've committed a new version 0.99.2 and removed 'Suggests: knitr' and changed VignetteBuilder to Sweave as you said. I also added the .BBSoptions.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
973eca1 ExCluster 0.99.3 fixing VignetteBuilder issue
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Ishep: I have pushed version 0.99.4 and it has passed the Builds and Checks on linux and mac operating systems.
As a note, changing 'VignetteBuilder: knitr' to 'VignetteBuilder: Sweave' caused an error, as Sweave is not a package. However, referencing Sweave's master package utils did not work either ('VignetteBuilder: utils::Sweave').
I simply deleted the VignetteBuilder: and Suggests: lines from the DESCRIPTION file, and R should default to Sweave. This seemed to have fixed the problem.
As per Windows incompatibility: I have spoken with my research group, and they agree no practical exon read counting solution exists on Windows. However, we may create and host an image for a linux Virtual Machine which users can download to run ExCluster on Windows, if users request this.
Thank you. I will try to have a more thorough review of your package in the next few days.
I just tried to clone and build your package and am getting an ERROR. Please correct this so that I can test and review your package.
$ R CMD build ExCluster/
Bioconductor version 3.8 (BiocInstaller 1.31.1), ?biocLite for help
* checking for file 'ExCluster/DESCRIPTION' ... OK
* preparing 'ExCluster':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Loading required package: Rsubread
Error: processing vignette 'ExCluster.Rnw' failed with diagnostics:
chunk 2
Error in read.table(file = GTF.File, header = FALSE, stringsAsFactors = FALSE) :
object 'GTF.File' not found
Execution halted
Received a valid push; starting a build. Commits are:
6c5d6bc Bumping to version 0.99.5 to check build after err...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi Ishep, I have fixed the error, bumped the version, and the build check passed again.
My apologies, there was an issues where 2 functions were access global variables from within the function, instead of access variables specified in the arguments. I've fixed that, and they are now correctly grabbing the variables assigned in the arguments.
Thank you for your submission to Bioconductor. Please see the following points:
GENERAL
[ ] you are not Depending, Importing, or Suggesting any Bioconducotr packages. There should be some interoperability with existing Bioconductor classes and methods for RNA-seq data. Especially GenomicRanges, SummarizedExperiment, and rtracklayer. See http://bioconductor.org/developers/how-to/commonMethodsAndClasses/
[ ] How does this differ from existing RNA-seq differential expression analysis. Namely DESeq2, DEXSeq, and JunctionSeq
BUILD REPORT Fix the following NOTES:
* checking DESCRIPTION meta-information ... NOTE
Malformed Title field: should not end in a period.
Package listed in more than one of Depends, Imports, Suggests, Enhances:
'Rsubread'
* checking dependencies in R code ... NOTE
'library' or 'require' call to ‘Rsubread’ which was already attached by Depends.
Please remove these calls from your code.
* checking R code for possible problems ... NOTE
for global functions and global variables.
see ?globalVariables
and suggest in NOTE about adding importFrom to NAMESPACE
* Checking unit tests...
* NOTE: Consider adding unit tests. We strongly encourage them. See
http://bioconductor.org/developers/how-to/unitTesting-guidelines/.
* Checking coding practice...
* NOTE: Avoid 1:...; use seq_len() or seq_along() found in files:
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source,
and vignette source...
* NOTE: Consider shorter lines; 371 lines (17%) are > 80 characters
long.
* NOTE: Consider multiples of 4 spaces for line indents, 17
lines(1%) are not.
DESCRIPTION
NEWS
ExCluster Version 0.99 submission to bioconductor!
to Changes in Version 0.99.0
inst
vignettes
[ ] Remove hidden files ._ExCluster.Rnw and ._ExCluster.pdf. Also remove ExCluster.pdf, ExCluster.synctex.gz, ExCluster.tex, and ExCluster_examples.R. This directory really should only contain the ExCluster.Rnw
[ ] Add a Table of Contents
[ ] Remove this line when it asks you if you want to update [a/s/n] packages, type ’n’ and hit enter:
Leave it up to the user if they want to update any
packages.
[ ] Is there a reason you can't make the code robust enought to work with chromosomes specified as '1, 2, 3' instead of 'chr1, chr2, chr3' etc?
[ ] In your toy example, you only need one library(ExCluster)
call.
[ ] Reuse your data object. It seems like you wouldn't need all the data -
for example you supposely "flatten the GFT to GFF" in the first section of
code, why not use this object rather than loading another system file
"sub_gen.v23.ExClust.gff". And similarly you just did the "process counts
step" and generated a normCounts so you shouldn't have to load a system file
"normCounts.txt"
The extdata included should be minimal and you should used the object created
along the way rather than "reloading" system files. This doesn't seem
currently possible as trying to use output from previous steps results in
ERRORS - this makes me skeptical of the results that are generated along the
way.
[ ] Do not right to the users system in test, examples, or vignettes. Any output created should right to tempdir(). Always use tempdir() or tempfile() for output of test, examples, or vigenttes.
[ ] Would be great to show an example integrating with existing Bioconductor packages and classes.
man
R Code GFF_convert.R
[ ] Can rtracklayer and rtracklayer results be utilized for reading files? There already exists many import functions for different file types; use an existing rather than creating your own. You should be able to use minimally the results of rtracklayer or other Bicoonductor import methods for both GTF and GFF files in the rest of your code.
[ ] you are assuming a specific ordering of columns without actually checking this ordering is consistent and hard coding values in your code assuming this. (i.e number of columns 9 - that the 9 columns are in the order of some 'default format' ie. V7 is assumed +/- , v3 has 'exon') - perhaps this is ok but please elaborate why you can assume this?
[ ] You repeat the same code for gtf.plus and gtf.neg - take the repeated code and put into a helper function. We don't encourage large chunks of repeated code.
General Code comment:
[ ] Your code needs to be able to work with already existing Bioconductor infastructure for RNA-seq data.
[ ] can for
loops can be more efficent with vapply?
Please make corrections based on the above comments and comment back here with what has been updated and responses to clarification points above.
Thank you.
Received a valid push; starting a build. Commits are:
7927519 deleting unnecessary files... 44b0bb8 made numerous changes as per Bioconductor review d789230 deleting unnecessary vignette files 9fc5837 changed man files to max 80 lines for code which w... d94c7d4 updating git.ignore to prevent extraneous files fr... 5f58b6c added table of contents to vignette 1c9cf38 updated gitignore & cleaned up vignette
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi Lori,
I have pushed a new version of ExCluster which fixes many of the issues you identified. Important Fixes: 1) Completely re-wrote GFF_convert() to have helper functions and reduce repeated code. This function now also uses GenomicRanges to collapse exon bins, rather than re-implementing its functionality. 2) Added inst/script/ folder which contains the bash commands used to generate toy datasets in linux 3) Added table of contents to vignette & removed suggestion for package updates 4) The new GFF_convert should work with either '1','2','3', or 'chr1','chr2','chr3', etc. for chromosome names 5) Simplified toy dataset example code 6) No long writing files to the package directory, now using tempdir() 7) Removed various hidden files & added them to git.ignore for the future
Outstanding Issues that require clarification 1) We now use GenomicRanges, however SummarizeExperiment and rtracklayer would not be appropriate for our package. What we are doing is so different from other programs. For example, rtracklayer is excellent for manipulating GTF files, which are a complex data structure. However, we immediately simplify the GTF format by subsetting "exon" rows only, eliminating the need for rtracklayer.
2) Our algorithm is different from DESeq2, DEXSeq, and JunctionSeq, in that we are detecting significantly differentially expressed exon clusters, not simply individual instances of exon or exon-exon junction DE. We have simulated data we are combining with our manuscript that shows our ROC curve AUC is much, much better than our competitors.
3) Are unit tests absolutely necessary? I don't mind adding some, however it is difficult to see how I would get them to work for my functions.
4) Do I need to limit my lines to 80 characters in all cases? I have gone through and broken up very long lines, but some of my lines have up to 32 characters of indentations. Limiting to 80 characters in these cases would reduce code readability.
5) Our algorithm is limited to comparing 2 conditions at any one time because it is quite complex already. If "CombineExons" is set to FALSE, the algorithm can take many hours. Users seeking multiple condition comparisons would be better off conducting their own multiple analyses, either in R or through shell scripts and a job submitter. Many other differential splicing algorithms also only offer comparisons between two conditions, as the computational complexity is much higher than differential gene expression (i.e. DESEq2).
6) Could you clarify what you mean by "integrating examples with existing Bioconductor packages/classes"? I'm just not sure what you mean by that. We do use GenomicRanges now in GFF_convert, if that is what you were referring to.
7) As mentioned above, rtracklayer would be unnecessary for loading and manipulating GTF files, as we are immediately simplifying the data structure by sub-setting exon rows only.
8) I have hard coded the column names for parsing GTF files because they have no natural column names. GTF files are a very specific file format, which should always have the 1st column specify the chromosome, 3rd column specify "gene", "exon", etc., 4th and 5th columns start/end, 7th column strand, and so on. If a GTF file does not have these columns correctly placed, then they are not valid GTF files and should not be used.
GFF files are a little more open-ended, but some columns must remain the same. Our algorithm also collapses exon bins to be non-overlapping and adds a 10th optional column for gene names, and thus ExCluster will only take GFF files produced by our package. No other package is capable of producing GFF files with the necessary information. Because of this we can expect an un-altered GFF file.
General Code Comments clarification
1) How do I ensure that my package works with existing Bioconductor infrastructure?
2) I've changed a number of loops to vapply/lapply, however, due to issues with variable scoping and clarity of code, I would prefer to leave some loops intact.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi again Lori,
I've made a number of changes to fix small issues some of my Bioinformatics group found when running ExCluster on RNA-seq data we have. I also just added a patch to allow users to set the number of cores & temporary file directory when running processCounts. These arguments are then passed to featureCounts to allow it to run smoother/faster.
I can bump the version if you like, but it's a very minor change.
Overall, my comments from yesterday are still the same. Please refer to them for more details. However, to re-iterate some of the major points:
1) We now use GenomicRanges in GFF_convert, but I do not feel that SummarizedExperiment or rtracklayer would be necessary.
2) Our algorithm is very different from DESeq2, DEXSeq, and JunctionSeq, as we hierarchically cluster exon log2FC values and detect DE exon clusters.
3) Are unit tests 100% necessary? I will add them today/tomorrow if they are. I just don't know how many I need and where.
4) I've limited all manual page lines to 80 characters to avoid the NOTES that result. However, in some cases a hard limit of 80 character code lines would reduce readability of my code.
5) Our algorithm is quite complex, as are other differential splicing tools. I could write a wrapper function for multiple condition comparisons. However, when CombineExons=FALSE the analysis can take hours just to compare 2 conditions. Users would be better off writing their own R script or shell scripts to compare multiple conditions.
6) We now use GenomicRanges. I'm not sure what other Bioconductor packages/classes we would need to work with.
7) I have hard coded access GTF columns because these are very specific data structures. For example, if column 7 does not equal the DNA strand in +/- format, then the GTF file is corrupted or not read into R correctly, and should not be used.
8) I have also hard coded GFF columns, as our algorithm only accepts GFF files produced by ExCluster. The GFF file format is more variable (GFF, GFF3, etc.). However, only ExCluster produces GFF files with the information necessary to conduct our analyses, and thus ExCluster tests to make sure that such a GFF file is given.
Yes please bump the version again so that there is an updated build report.
Received a valid push; starting a build. Commits are:
9460d16 Bumped version to 0.99.8 to reflect changes to the...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Done -- bumped the version to 0.99.8 and the new build passed. I've cleaned up all of NOTES I was getting from the R Studio check, so hopefully the only NOTES left are for Unit Tests & formatting issues related to the number of characters per line.
Thank you. I'll try and have a re-review within in the next day or two.
Thank you for your updates a few additional comments:
Build Report
GFF_convert:
You mentioned:
8. I have also hard coded GFF columns, as our algorithm only accepts GFF files
produced by ExCluster. The GFF file format is more variable (GFF, GFF3,
etc.). However, only ExCluster produces GFF files with the information necessary
to conduct our analyses, and thus ExCluster tests to make sure that such a GFF
file is given.
7. I have hard coded access GTF columns because these are very specific data
structures. For example, if column 7 does not equal the DNA strand in +/-
format, then the GTF file is corrupted or not read into R correctly, and should
not be used.
ExCluster.R
Please update with changes and comment back here when I should take another look. Cheers
Hello. I just wanted to check it to see how the above changes were coming? If you do not currently have time to push further changes I can close the issue temporarily until time permits?
Hi,
I do have changes coming, but it may take another few days. I was hoping to have them done by early next week.
Would closing the issue prevent me from pushing changes at a later date?
On Thu, Jul 19, 2018 at 7:22 AM, lshep notifications@github.com wrote:
Hello. I just wanted to check it to see how the above changes were coming? If you do not currently have time to push further changes I can close the issue temporarily until time permits?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-406243080, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKYwpF6V9p1-TVAvjKph7oPUNjDbgks5uIGvhgaJpZM4UbE76 .
No need to close the issue. We can keep the issue open if you expect changes within the next 2 weeks. We just want to ensure packages submitted are active. If the issue does close, requesting it be reopened would allow pushing changes again. We try to contact maintainers if an issue has been idle for 2-3 weeks - just as we try to review the issue in 2-3 weeks. Looking forward to updates.
I will be pushing changes next week.
My apologies -- I was temporarily working on 2 other projects which took up more time than I expected, and we had some trouble implementing some of the changes Lori requested. We were trying to re-code our algorithm to use Genomic Ranges objects throughout, but it broke the code. We reverted to an earlier version, and we are now offering the option to output the results of certain functions as Genomic Ranges objects (for interoperability). However, the main function will not use GR objects -- hopefully this will satisfy the requirements.
-Matt
On Thu, Jul 19, 2018 at 10:38 AM, lshep notifications@github.com wrote:
No need to close the issue. We can keep the issue open if you expect changes within the next 2 weeks. We just want to ensure packages submitted are active. If the issue does close, requesting it be reopened would allow pushing changes again. We try to contact maintainers if an issue has been idle for 2-3 weeks - just as we try to review the issue in 2-3 weeks. Looking forward to updates.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-406299365, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKaievAsZFNEMMyGEy_zEcIFG3Do0ks5uIJnTgaJpZM4UbE76 .
Received a valid push; starting a build. Commits are:
33411cb Changes in Version 0.99.9 - Added the function rtr...
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi Lori,
I have finally updated those changes!! This post will be a little long, so bear with me:
1) I removed almost all of my c(1:n) with seq(n) -- I think I found all of them. However, in some cases I use c(2:n) which is a much simpler one-liner than anything I could do with seq(), so I left some of those cases.
2) I have added interoperability for users to produce GFF files from GTF files imported with rtracklayer, in the format of a new function rtracklayerGTFtoGFF (I can rename that function if you want, I wasn't sure what to call it). This function takes a GTF file in GRanges format, produced by the rtracklayer 'import' function. It produces a GTF file and calls GFF_convert within itself.
3) I have also added interoperability for data structures with genomic coordinates to be represented as GRanges objects in the format of two other functions, GRangesFromGFF and GRangesFromExClustResults. These functions do what they say they do, and re-format said data structures into GRanges objects, should users desire to work in this format.
4) However, the main ExCluster function does not use GRanges objects. I attempted to re-code the package to use GRanges objects throughout, but this was a disaster. I ended up reverting a bunch of changes. The GRanges format is simply not nice to work with outside of the GRanges package. In addition, featureCounts from Rsubread does not accept GRanges objects for SAF annotations (simplified annotation format), which processCounts uses. It would not make sense for GFF_convert to output GRanges, only to convert GRanges back to SAF format anyway.
5) Therefore, I have updated the Vignette with an extra section about working with rtracklayer GTF files, as well as reformatting the output of GFF_convert and ExCluster to GRanges format, should users desire to work with those data structures. I feel this is a fair compromise that maintains interoperability, while not requiring heavy re-writing of the entier algorithm.
6) I removed the repeated code in GFF_convert.
7) I have added numerous checks on GTF, GFF, and ExCluster results format, to make sure that data is in the correct format, and any hard coded column calls are accessing the correct data. Hopefully you can take a look at these and give me some feedback.
Overall I feel the package is in pretty good shape now, with the improvements you have suggested. Hopefully this is close to ready for acceptance. Thank you so much for all the time you've taken to improve the package. This is my first time making a Bioconductor package, so it's been a learning experience.
-Matt
just a few quick comments as I drive by...
For 1. use seq_len()
rather than seq()
.
I don't understand why you would write a function GRangesFromGFF()
when this is what rtracklayer::import()
does.
You don't need to re-write your package to internally use GRanges (though many packages do this -- check out the 'Depends on me' / 'Import me' on the GenomicRanges landing page). Instead you MUST change the interface to your package to interoperate with standard Bioconductor objects like GRanges. So if a function visible to the user fun(df)
accepts at data frame that has start and end coordinates, rename it fun_internal
and write
fun <- function(granges) {
df <- data.frame(start = start(granges) <etc>
result <- fun_internal(df)
## make result into a standard Bioconductor object, if possible
}
Your code, e.g., ExCluster() was pretty hard to parse. The things that make it difficult are (a) the internally defined functions (just make the top-level functions that are not exported from your namespace, (b) the very long function body (instead, break it into functions that each accomplish part of the task and can be implemented / debugged / tested / understood separately) (c) the very verbose error messages which is good but the long texts disrupt the programming logic, maybe overcome this by creating a 'dictionary' of error messages
.errors = list(
one_error = "a very long description ...
...
)
stop(.errors[["one_error"]])
Hi Morgan,
Thanks for the comments. I think I understand most of what you are saying. If you wouldn't mind giving me a few extra pointers, it would really help. This is the first package I've developed, and no one in my lab has experience publishing packages in such a comprehensive manner.
1) How does one create a dictionary of error messages? For example, would this be a function that exports error message strings as a list? i.e. error.Messages[[1]], error.Messages[[2]], etc.?? Is there a cleaner way to define such a list of error messages? This would obviously create problems if I hard code error messages to particular numbers, and then add in error messages later (I'd have to change all the numbers in the code).
2) Could I define a function of functions, called internally? For example, the functions section of ExCluster() -- is there a clean way to export all of those functions into the ExCluster environment from a separate function when ExCluster is run? This would greatly reduce the length of the ExCluster code.
Thanks, -Matt
On Wed, Aug 8, 2018 at 1:55 PM, Martin Morgan notifications@github.com wrote:
just a few quick comments as I drive by...
For 1. use seq_len() rather than seq().
I don't understand why you would write a function GRangesFromGFF() when this is what rtracklayer::import() does.
You don't need to re-write your package to internally use GRanges (though many packages do this -- check out the 'Depends on me' / 'Import me' on the GenomicRanges landing page https://bioconductor.org/packages/GenomicRanges). Instead you MUST change the interface to your package to interoperate with standard Bioconductor objects like GRanges. So if a function visible to the user fun(df) accepts at data frame that has start and end coordinates, rename it fun_internal and write
fun <- function(granges) { df <- data.frame(start = start(granges)
result <- fun_internal(df) make result into a standard Bioconductor object, if possible
}
Your code, e.g., ExCluster() was pretty hard to parse. The things that make it difficult are (a) the internally defined functions (just make the top-level functions that are not exported from your namespace, (b) the very long function body (instead, break it into functions that each accomplish part of the task and can be implemented / debugged / tested / understood separately) (c) the very verbose error messages which is good but the long texts disrupt the programming logic, maybe overcome this by creating a 'dictionary' of error messages
.errors = list( one_error = "a very long description ... ... ) stop(.errors[["one_error"]])
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-411495870, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKXQCPLo2x2bKRs81LCgTXWi4kVfJks5uOyYlgaJpZM4UbE76 .
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.