Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

ExCluster #762

Closed RMTbioinfo closed 5 years ago

RMTbioinfo commented 6 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 6 years ago

Hi @RMTbioinfo

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: ExCluster
Type: Package
Title: ExCluster robustly detects differentially expressed exons between two
    conditions of RNA-seq data, requiring at least two independent biological
    replicates per condition.
Version: 0.99.0
Author: 2018-06-01
Author: R. Matthew Tanner, William L. Stanford, and Theodore J. Perkins
Maintainer: R. Matthew Tanner <rtann038@uottawa.ca>
Description: ExCluster flattens Ensembl and GENCODE GTF files into GFF files,
    which are used to count reads per non-overlapping exon bin from BAM files.
    This read counting is done using the function featureCounts from the package
    Rsubread. Library sizes are normalized across all biological replicates, and
    ExCluster then compares two different conditions to detect signifcantly 
    differentially spliced genes. This process requires at least two independent
    biological repliates per condition, and ExCluster accepts only exactly two
    conditions at a time. ExCluster ultimately produces false discovery rates
    (FDRs) per gene, which are used to detect significance. Exon log2 fold
    change (log2FC) means and variances may be plotted for each significantly 
    differentially spliced gene, which helps scientists develop hypothesis and
    target differential splicing events for RT-qPCR validation in the wet lab.
biocViews: DifferentialSplicing, RNASeq, Software
VignetteBuilder: knitr
Suggests: knitr
Depends: Rsubread
Imports: Rsubread
License: GPL-3
Encoding: UTF-8
LazyData: true
mtmorgan commented 6 years ago

Before adding your package to the review queue, the vignette needs to be modified to have runnable examples --- <<>>= code chunks. Likely this requires 'real' data, which might be provided in an (very) abbreviated form in your package, from an existing ExperimentData package, from a resource downloaded from the web (using, e.g., BiocFileCache so that the download cost is not paid on each nightly build), or from creation of an ExperimentHub data resource.

Also, please consider whether there are opportunities to re-use standard Bioconductor packages, especially GenomicRanges, SummarizedExperiment, and rtracklayer, to represent and manipulate the inputs and products of your workflow in a robust and interoperable way.

Please also remove files that are not needed to build your package, e.g., the vignettes/ directory should contain only your .Rnw file.

Please make these initial revisions and then post a brief comment here summarizing the changes you have made.

RMTbioinfo commented 6 years ago

Committed & Pushed new vignette code to Git Hub. The ExCluster 0.99.0 package now contains code chunks in the vignette which can be executed directly from the package.

bioc-issue-bot commented 6 years ago

A reviewer has been assigned to your package Learn what to expect during the review process.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Committed changes to ExCluster package to remove extraneous files which were causing an error in the build:

ERROR: System Files found that should not be git tracked: ExCluster.Rproj ._.DS_Store .DS_Store ExCluster.log

lshep commented 6 years ago

Please make sure the webhook is established so that a version bump will trigger a new build report. A build will only be triggered than with a version bump in the DESCRIPTION file (i.e 0.99.0 to 0.99.1 )

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

a463c13 Changed version to 0.99.1 After deleting extraneo...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Not sure how to proceed with 2 issues:

1) TIMEOUT: R CMD BiocCheck exceeded 0mins on both Mac and Linux. How is this fixed? Does BiocCheck really have to run in < 1 minute?

2) malbec1 BUILD BIN was skipped. Is this a problem that needs addressing? The output section for this was empty.

lshep commented 6 years ago

No BiocCheck has 10 minutes to run - that output is a typo that I will fix. malbec1 BUILD BIN is skipped for this platform and nothing needs to be addressed.

RMTbioinfo commented 6 years ago

Ok, thanks so much!

And the windows version not building is fine -- this package shouldn't be made available for Windows. Running the processCounts function is an essential part of the package, and it requires the Rsubread package -- which is only available on Mac & Linux operating systems.

lshep commented 6 years ago

The TIMEOUT is occurring because of an issue with your vignette. If you run R CMD BiocCheck ExCluster you will see that a pop up window occurs, it waits for a reply which never happens as the checks are not run interactively. The issue stems from using Sweave specific options but you are trying to process with knitr. There are a few options you could take to remedy this:

  1. do Sweave2knitr()
  2. process with Sweave and not with knitr
  3. Don't use noweb syntax

We really would recommend finding other solutions than using Rsubread that would be compatible with all platforms. You are limiting the scope of your package and preventing a large group of users access to your package.

RMTbioinfo commented 6 years ago

Ahh, you're right.

1) How do I process a vignette with Sweave only? I can uninstall knitr and the vignette builds just fine in R Studio. However, when I run BiocCheck, it keeps asking for the knitr package. How do I tell BiocCheck that the package Vignette should be processed with Sweave only.

2) Read counting in general seems problematic in Windows. The other obvious option is BEDtools which I was using previously, however BEDtools will not run natively in Windows. Every solution I have found which correctly counts exonic reads is either 1) extremely slow (1 hour + per sample) or 2) is not compatible with windows.

lshep commented 6 years ago
  1. Change the VignetteBuilder in the DESCRIPTION to Sweave and remove the suggests knitr (unless you use the functionality elsewhere)
  2. Very well. If you insist your package will be incompatible with windows by using Rsubread than please add a .BBSoptions file to the top level directory of your package and include the line UnsupportedPlatforms: win

Once the package builds with a clean build report I will review the package further.

RMTbioinfo commented 6 years ago

I've committed a new version 0.99.2 and removed 'Suggests: knitr' and changed VignetteBuilder to Sweave as you said. I also added the .BBSoptions.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

e491e99 Removed knitr suggestion and added .BBSoptions 3ad9878 Changed version to 0.99.2 and changed VignetteBuil... 42a8063 Merge branch 'master' of github.com:RMTbioinfo/ExC...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

973eca1 ExCluster 0.99.3 fixing VignetteBuilder issue

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

153774d Deleted ExCluster.log causing errors and subscribe... adf32be Subscribed to the Bioc-devel mailing list as requi...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Ishep: I have pushed version 0.99.4 and it has passed the Builds and Checks on linux and mac operating systems.

As a note, changing 'VignetteBuilder: knitr' to 'VignetteBuilder: Sweave' caused an error, as Sweave is not a package. However, referencing Sweave's master package utils did not work either ('VignetteBuilder: utils::Sweave').

I simply deleted the VignetteBuilder: and Suggests: lines from the DESCRIPTION file, and R should default to Sweave. This seemed to have fixed the problem.

As per Windows incompatibility: I have spoken with my research group, and they agree no practical exon read counting solution exists on Windows. However, we may create and host an image for a linux Virtual Machine which users can download to run ExCluster on Windows, if users request this.

lshep commented 6 years ago

Thank you. I will try to have a more thorough review of your package in the next few days.

lshep commented 6 years ago

I just tried to clone and build your package and am getting an ERROR. Please correct this so that I can test and review your package.

$ R CMD build ExCluster/
Bioconductor version 3.8 (BiocInstaller 1.31.1), ?biocLite for help
* checking for file 'ExCluster/DESCRIPTION' ... OK
* preparing 'ExCluster':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Loading required package: Rsubread

Error: processing vignette 'ExCluster.Rnw' failed with diagnostics:
 chunk 2 
Error in read.table(file = GTF.File, header = FALSE, stringsAsFactors = FALSE) : 
  object 'GTF.File' not found
Execution halted
bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

6c5d6bc Bumping to version 0.99.5 to check build after err...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Hi Ishep, I have fixed the error, bumped the version, and the build check passed again.

My apologies, there was an issues where 2 functions were access global variables from within the function, instead of access variables specified in the arguments. I've fixed that, and they are now correctly grabbing the variables assigned in the arguments.

lshep commented 6 years ago

Thank you for your submission to Bioconductor. Please see the following points:

GENERAL

BUILD REPORT Fix the following NOTES:

* checking DESCRIPTION meta-information ... NOTE
Malformed Title field: should not end in a period.
Package listed in more than one of Depends, Imports, Suggests, Enhances:
  'Rsubread'

* checking dependencies in R code ... NOTE
'library' or 'require' call to ‘Rsubread’ which was already attached by Depends.
  Please remove these calls from your code.

* checking R code for possible problems ... NOTE
for global functions and global variables.  
see  ?globalVariables
and suggest in NOTE about adding importFrom to NAMESPACE

* Checking unit tests...
    * NOTE: Consider adding unit tests. We strongly encourage them. See
      http://bioconductor.org/developers/how-to/unitTesting-guidelines/.

* Checking coding practice...
    * NOTE: Avoid 1:...; use seq_len() or seq_along() found in files:

* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source,
  and vignette source...
    * NOTE: Consider shorter lines; 371 lines (17%) are > 80 characters
      long.
    * NOTE: Consider multiples of 4 spaces for line indents, 17
      lines(1%) are not.

DESCRIPTION

NEWS

inst

vignettes

man

R Code GFF_convert.R

General Code comment:

Please make corrections based on the above comments and comment back here with what has been updated and responses to clarification points above.

Thank you.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

7927519 deleting unnecessary files... 44b0bb8 made numerous changes as per Bioconductor review d789230 deleting unnecessary vignette files 9fc5837 changed man files to max 80 lines for code which w... d94c7d4 updating git.ignore to prevent extraneous files fr... 5f58b6c added table of contents to vignette 1c9cf38 updated gitignore & cleaned up vignette

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Hi Lori,

I have pushed a new version of ExCluster which fixes many of the issues you identified. Important Fixes: 1) Completely re-wrote GFF_convert() to have helper functions and reduce repeated code. This function now also uses GenomicRanges to collapse exon bins, rather than re-implementing its functionality. 2) Added inst/script/ folder which contains the bash commands used to generate toy datasets in linux 3) Added table of contents to vignette & removed suggestion for package updates 4) The new GFF_convert should work with either '1','2','3', or 'chr1','chr2','chr3', etc. for chromosome names 5) Simplified toy dataset example code 6) No long writing files to the package directory, now using tempdir() 7) Removed various hidden files & added them to git.ignore for the future

Outstanding Issues that require clarification 1) We now use GenomicRanges, however SummarizeExperiment and rtracklayer would not be appropriate for our package. What we are doing is so different from other programs. For example, rtracklayer is excellent for manipulating GTF files, which are a complex data structure. However, we immediately simplify the GTF format by subsetting "exon" rows only, eliminating the need for rtracklayer.

2) Our algorithm is different from DESeq2, DEXSeq, and JunctionSeq, in that we are detecting significantly differentially expressed exon clusters, not simply individual instances of exon or exon-exon junction DE. We have simulated data we are combining with our manuscript that shows our ROC curve AUC is much, much better than our competitors.

3) Are unit tests absolutely necessary? I don't mind adding some, however it is difficult to see how I would get them to work for my functions.

4) Do I need to limit my lines to 80 characters in all cases? I have gone through and broken up very long lines, but some of my lines have up to 32 characters of indentations. Limiting to 80 characters in these cases would reduce code readability.

5) Our algorithm is limited to comparing 2 conditions at any one time because it is quite complex already. If "CombineExons" is set to FALSE, the algorithm can take many hours. Users seeking multiple condition comparisons would be better off conducting their own multiple analyses, either in R or through shell scripts and a job submitter. Many other differential splicing algorithms also only offer comparisons between two conditions, as the computational complexity is much higher than differential gene expression (i.e. DESEq2).

6) Could you clarify what you mean by "integrating examples with existing Bioconductor packages/classes"? I'm just not sure what you mean by that. We do use GenomicRanges now in GFF_convert, if that is what you were referring to.

7) As mentioned above, rtracklayer would be unnecessary for loading and manipulating GTF files, as we are immediately simplifying the data structure by sub-setting exon rows only.

8) I have hard coded the column names for parsing GTF files because they have no natural column names. GTF files are a very specific file format, which should always have the 1st column specify the chromosome, 3rd column specify "gene", "exon", etc., 4th and 5th columns start/end, 7th column strand, and so on. If a GTF file does not have these columns correctly placed, then they are not valid GTF files and should not be used.

GFF files are a little more open-ended, but some columns must remain the same. Our algorithm also collapses exon bins to be non-overlapping and adds a 10th optional column for gene names, and thus ExCluster will only take GFF files produced by our package. No other package is capable of producing GFF files with the necessary information. Because of this we can expect an un-altered GFF file.

General Code Comments clarification

1) How do I ensure that my package works with existing Bioconductor infrastructure?

2) I've changed a number of loops to vapply/lapply, however, due to issues with variable scoping and clarity of code, I would prefer to leave some loops intact.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

3f82731 fixed bug causing errors in ExCluster 0db97ef updated GFF example & fixed bug in ExCluster Remov... aa34120 Fixed bugs in certain cases which would cause ExCl... 4b49ab1 Changed Date & Version for DESCRIPTION for upload ...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Hi again Lori,

I've made a number of changes to fix small issues some of my Bioinformatics group found when running ExCluster on RNA-seq data we have. I also just added a patch to allow users to set the number of cores & temporary file directory when running processCounts. These arguments are then passed to featureCounts to allow it to run smoother/faster.

I can bump the version if you like, but it's a very minor change.

Overall, my comments from yesterday are still the same. Please refer to them for more details. However, to re-iterate some of the major points:

1) We now use GenomicRanges in GFF_convert, but I do not feel that SummarizedExperiment or rtracklayer would be necessary.

2) Our algorithm is very different from DESeq2, DEXSeq, and JunctionSeq, as we hierarchically cluster exon log2FC values and detect DE exon clusters.

3) Are unit tests 100% necessary? I will add them today/tomorrow if they are. I just don't know how many I need and where.

4) I've limited all manual page lines to 80 characters to avoid the NOTES that result. However, in some cases a hard limit of 80 character code lines would reduce readability of my code.

5) Our algorithm is quite complex, as are other differential splicing tools. I could write a wrapper function for multiple condition comparisons. However, when CombineExons=FALSE the analysis can take hours just to compare 2 conditions. Users would be better off writing their own R script or shell scripts to compare multiple conditions.

6) We now use GenomicRanges. I'm not sure what other Bioconductor packages/classes we would need to work with.

7) I have hard coded access GTF columns because these are very specific data structures. For example, if column 7 does not equal the DNA strand in +/- format, then the GTF file is corrupted or not read into R correctly, and should not be used.

8) I have also hard coded GFF columns, as our algorithm only accepts GFF files produced by ExCluster. The GFF file format is more variable (GFF, GFF3, etc.). However, only ExCluster produces GFF files with the information necessary to conduct our analyses, and thus ExCluster tests to make sure that such a GFF file is given.

lshep commented 6 years ago

Yes please bump the version again so that there is an updated build report.

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

9460d16 Bumped version to 0.99.8 to reflect changes to the...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Done -- bumped the version to 0.99.8 and the new build passed. I've cleaned up all of NOTES I was getting from the R Studio check, so hopefully the only NOTES left are for Unit Tests & formatting issues related to the number of characters per line.

lshep commented 6 years ago

Thank you. I'll try and have a re-review within in the next day or two.

lshep commented 6 years ago

Thank you for your updates a few additional comments:

Build Report

GFF_convert:

You mentioned:

8. I have also hard coded GFF columns, as our algorithm only accepts GFF files
produced by ExCluster. The GFF file format is more variable (GFF, GFF3,
etc.). However, only ExCluster produces GFF files with the information necessary
to conduct our analyses, and thus ExCluster tests to make sure that such a GFF
file is given.
7.  I have hard coded access GTF columns because these are very specific data
structures. For example, if column 7 does not equal the DNA strand in +/-
format, then the GTF file is corrupted or not read into R correctly, and should
not be used.

ExCluster.R

Please update with changes and comment back here when I should take another look. Cheers

lshep commented 6 years ago

Hello. I just wanted to check it to see how the above changes were coming? If you do not currently have time to push further changes I can close the issue temporarily until time permits?

RMTbioinfo commented 6 years ago

Hi,

I do have changes coming, but it may take another few days. I was hoping to have them done by early next week.

Would closing the issue prevent me from pushing changes at a later date?

On Thu, Jul 19, 2018 at 7:22 AM, lshep notifications@github.com wrote:

Hello. I just wanted to check it to see how the above changes were coming? If you do not currently have time to push further changes I can close the issue temporarily until time permits?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-406243080, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKYwpF6V9p1-TVAvjKph7oPUNjDbgks5uIGvhgaJpZM4UbE76 .

lshep commented 6 years ago

No need to close the issue. We can keep the issue open if you expect changes within the next 2 weeks. We just want to ensure packages submitted are active. If the issue does close, requesting it be reopened would allow pushing changes again. We try to contact maintainers if an issue has been idle for 2-3 weeks - just as we try to review the issue in 2-3 weeks. Looking forward to updates.

RMTbioinfo commented 6 years ago

I will be pushing changes next week.

My apologies -- I was temporarily working on 2 other projects which took up more time than I expected, and we had some trouble implementing some of the changes Lori requested. We were trying to re-code our algorithm to use Genomic Ranges objects throughout, but it broke the code. We reverted to an earlier version, and we are now offering the option to output the results of certain functions as Genomic Ranges objects (for interoperability). However, the main function will not use GR objects -- hopefully this will satisfy the requirements.

-Matt

On Thu, Jul 19, 2018 at 10:38 AM, lshep notifications@github.com wrote:

No need to close the issue. We can keep the issue open if you expect changes within the next 2 weeks. We just want to ensure packages submitted are active. If the issue does close, requesting it be reopened would allow pushing changes again. We try to contact maintainers if an issue has been idle for 2-3 weeks - just as we try to review the issue in 2-3 weeks. Looking forward to updates.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-406299365, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKaievAsZFNEMMyGEy_zEcIFG3Do0ks5uIJnTgaJpZM4UbE76 .

bioc-issue-bot commented 6 years ago

Received a valid push; starting a build. Commits are:

33411cb Changes in Version 0.99.9 - Added the function rtr...

bioc-issue-bot commented 6 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, UNSUPPORTED". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

RMTbioinfo commented 6 years ago

Hi Lori,

I have finally updated those changes!! This post will be a little long, so bear with me:

1) I removed almost all of my c(1:n) with seq(n) -- I think I found all of them. However, in some cases I use c(2:n) which is a much simpler one-liner than anything I could do with seq(), so I left some of those cases.

2) I have added interoperability for users to produce GFF files from GTF files imported with rtracklayer, in the format of a new function rtracklayerGTFtoGFF (I can rename that function if you want, I wasn't sure what to call it). This function takes a GTF file in GRanges format, produced by the rtracklayer 'import' function. It produces a GTF file and calls GFF_convert within itself.

3) I have also added interoperability for data structures with genomic coordinates to be represented as GRanges objects in the format of two other functions, GRangesFromGFF and GRangesFromExClustResults. These functions do what they say they do, and re-format said data structures into GRanges objects, should users desire to work in this format.

4) However, the main ExCluster function does not use GRanges objects. I attempted to re-code the package to use GRanges objects throughout, but this was a disaster. I ended up reverting a bunch of changes. The GRanges format is simply not nice to work with outside of the GRanges package. In addition, featureCounts from Rsubread does not accept GRanges objects for SAF annotations (simplified annotation format), which processCounts uses. It would not make sense for GFF_convert to output GRanges, only to convert GRanges back to SAF format anyway.

5) Therefore, I have updated the Vignette with an extra section about working with rtracklayer GTF files, as well as reformatting the output of GFF_convert and ExCluster to GRanges format, should users desire to work with those data structures. I feel this is a fair compromise that maintains interoperability, while not requiring heavy re-writing of the entier algorithm.

6) I removed the repeated code in GFF_convert.

7) I have added numerous checks on GTF, GFF, and ExCluster results format, to make sure that data is in the correct format, and any hard coded column calls are accessing the correct data. Hopefully you can take a look at these and give me some feedback.

Overall I feel the package is in pretty good shape now, with the improvements you have suggested. Hopefully this is close to ready for acceptance. Thank you so much for all the time you've taken to improve the package. This is my first time making a Bioconductor package, so it's been a learning experience.

-Matt

mtmorgan commented 6 years ago

just a few quick comments as I drive by...

For 1. use seq_len() rather than seq().

I don't understand why you would write a function GRangesFromGFF() when this is what rtracklayer::import() does.

You don't need to re-write your package to internally use GRanges (though many packages do this -- check out the 'Depends on me' / 'Import me' on the GenomicRanges landing page). Instead you MUST change the interface to your package to interoperate with standard Bioconductor objects like GRanges. So if a function visible to the user fun(df) accepts at data frame that has start and end coordinates, rename it fun_internal and write

fun <- function(granges) {
     df <- data.frame(start = start(granges) <etc>
     result <- fun_internal(df)
     ## make result into a standard Bioconductor object, if possible
}

Your code, e.g., ExCluster() was pretty hard to parse. The things that make it difficult are (a) the internally defined functions (just make the top-level functions that are not exported from your namespace, (b) the very long function body (instead, break it into functions that each accomplish part of the task and can be implemented / debugged / tested / understood separately) (c) the very verbose error messages which is good but the long texts disrupt the programming logic, maybe overcome this by creating a 'dictionary' of error messages

.errors = list(
     one_error = "a very long description ...
    ...
)
stop(.errors[["one_error"]])
RMTbioinfo commented 6 years ago

Hi Morgan,

Thanks for the comments. I think I understand most of what you are saying. If you wouldn't mind giving me a few extra pointers, it would really help. This is the first package I've developed, and no one in my lab has experience publishing packages in such a comprehensive manner.

1) How does one create a dictionary of error messages? For example, would this be a function that exports error message strings as a list? i.e. error.Messages[[1]], error.Messages[[2]], etc.?? Is there a cleaner way to define such a list of error messages? This would obviously create problems if I hard code error messages to particular numbers, and then add in error messages later (I'd have to change all the numbers in the code).

2) Could I define a function of functions, called internally? For example, the functions section of ExCluster() -- is there a clean way to export all of those functions into the ExCluster environment from a separate function when ExCluster is run? This would greatly reduce the length of the ExCluster code.

Thanks, -Matt

On Wed, Aug 8, 2018 at 1:55 PM, Martin Morgan notifications@github.com wrote:

just a few quick comments as I drive by...

For 1. use seq_len() rather than seq().

I don't understand why you would write a function GRangesFromGFF() when this is what rtracklayer::import() does.

You don't need to re-write your package to internally use GRanges (though many packages do this -- check out the 'Depends on me' / 'Import me' on the GenomicRanges landing page https://bioconductor.org/packages/GenomicRanges). Instead you MUST change the interface to your package to interoperate with standard Bioconductor objects like GRanges. So if a function visible to the user fun(df) accepts at data frame that has start and end coordinates, rename it fun_internal and write

fun <- function(granges) { df <- data.frame(start = start(granges) result <- fun_internal(df)

make result into a standard Bioconductor object, if possible

}

Your code, e.g., ExCluster() was pretty hard to parse. The things that make it difficult are (a) the internally defined functions (just make the top-level functions that are not exported from your namespace, (b) the very long function body (instead, break it into functions that each accomplish part of the task and can be implemented / debugged / tested / understood separately) (c) the very verbose error messages which is good but the long texts disrupt the programming logic, maybe overcome this by creating a 'dictionary' of error messages

.errors = list( one_error = "a very long description ... ... ) stop(.errors[["one_error"]])

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/762#issuecomment-411495870, or mute the thread https://github.com/notifications/unsubscribe-auth/AlIAKXQCPLo2x2bKRs81LCgTXWi4kVfJks5uOyYlgaJpZM4UbE76 .