Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

multiMiR #483

Closed mmulvahill closed 7 years ago

mmulvahill commented 7 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 7 years ago

Dear @mmulvahill ,

You (or someone) has already posted that repository to our tracker.

See https://github.com/Bioconductor/Contributions/issues/359

You cannot post the same repository more than once. I am closing this issue.

mmulvahill commented 7 years ago

Hi @vobencha -- can you provide any clarification on resubmitting our package? Back in July, you suggested we would be able to resubmit. How do I go about getting around this auto-response?

Thanks -

vobencha commented 7 years ago

If you are ready with a new version I can re-open this issue. Valerie

mmulvahill commented 7 years ago

We are just about ready. I'm waiting on (internal, i.e. quick) approval to use a specific dataset in the examples to push my most recent changes.

In the meantime, I was hoping to get your input on an issue from your review of our previous submission. You requested that we add an interface to AnnotationDbi select(), keys(), keytypes(), and columns() functions for our return objects. The package has been available on our website since 2014 -- therefore we have a sizable number of existing users both with existing code and used to interfacing with the package in a certain way. My impression is that the previous S3 approach and the AnnotationDbi S4 approach are incompatible.

Do you have any suggestions for incorporating these S4 methods while not breaking the code of our existing users (who've used S3 techniques)? Or suggestions for what our options are?

mmulvahill commented 7 years ago

For quick reference, here are the reviewer comments from the previous submission and their status.

R1: Obviously needs better integration with Bioconductor.

Me: A little more explanation or an example of how would be greatly appreciated.

R1: We typically expect packages to plug into upstream and downstream workflows. The vignette shows many ways of retrieving miRNA annotations, etc, but does not give an example of using them in an actual analysis.

  1. DONE Add an example using other bioconductor packages -- in-progress, waiting on dataset approval.

R2: Hi,

I've done a first review. A few suggestions and comments below.

(1) BiocCheck output:

Checking for bioc-devel mailing list subscription... ERROR: Maintainer must subscribe to the bioc-devel mailing list. Subscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel

  1. DONE Subscribe to mailing list

(2) Packages in 'Imports' should be fully or partially imported in NAMESPACE.

  1. DONE Look into 'imports' and NAMESPACE

(3) Internal functions

If you don't expect the user to call a function I would not export it. If the function is exported you need a full man page with examples. For example, add.multimir.links.Rd does not have an example.

  1. DONE Reduce number of exported functions

(4) Function names:

Some have underscores, some have dots ("."). It looks like you're using dot for internal functions. If a user can call a function it technically isn't internal. I'd recommend being consistent and using underscores for all exported functions.

  1. DONE Replace dots with underscores in exported functions

(5) Since this package is serving up annotations, they should support the same methods as the other annotations in Bioconductor. Please implement the select() family interface from AnnotationDbi on the objects you return from get.multimir(). The family includes select(), keys(), keytypes() and cols().

  1. DONE Look into adding AnnotationDbi::select(), keys(), keytypes(), and cols() for get.mulitmir() objects -- implementation question asked

(6) Consider using the AnnotationFilter package for filtering of the results. If you feel you can't implement the filter interface in this package please explain why. The package is fairly new and if there are suggestions to expand compatibility then we'd like to hear them.

  1. Look at AnnotationFilter for filtering results -- implementation questions asked Not incorporating on first release, but open to for future versions if it would be helpful to users.

Let me know if you have questions.

vobencha commented 7 years ago

I would recommend submitting your package to CRAN instead of Bioconductor. Your question about what the infrastructure is and how one would incorporate indicates (suggests?) you don't currently use Bioconductor packages or classes for your analysis.

Asking about how/why you should use existing infrastructure at point of package submission seems a little late in the game. Can you explain why you want the package in Bioconductor and not CRAN?

Valerie

mmulvahill commented 7 years ago

So more than anything, I think it indicates my focused (narrow) role on a larger team of bioinformaticians and biostatistician. I'm the biostatistician handling the programming -- modularizing code, reducing dependencies and repetitions, optimization, and maintenance. My colleagues who did the initial work on this package and database are more familiar with, and active users of, Bioconductor.

They typically use Bioconductor packages for upstream analysis then query the multiMiR database, so the annotation classes were not on my radar. I'm out of town for the rest of the week, but I know the team strongly feels Bioconductor is the right place for the package and I expect its due to the reasons listed on our webpage (novel features) -- broadly-speaking, that a single entry-point into 14 large databases of miRNA-target and -disease interactions and associations is a significant contribution to the bioinformatics toolkit and would be provide benefits to researchers, the field, and disseminating their work equally.

I did bring up the AnnoDbi classes with the team, and they agreed that we should implement them. We would include an option for the legacy output (defaulted to FALSE), so existing users could quickly get old code working with the updated Bioconductor version. Would that suffice? -- Thanks!

vobencha commented 7 years ago

Thanks for the explanation.

Implementing select(), keys(), keytypes() etc. is important. Implementing AnnotationFilter is not as important and was more of an fyi.

You asked a question about S4 compatibility with previous S3 techniques. Can you give an example? The data was in database form before, right? How did the users interact with it then? You should not have to squash those methods, just add the new select() family. Maybe I don't understand the question. If my response isn't helpful, please clarify how the users previously interacted with the data.

Valerie

mmulvahill commented 7 years ago

Hi Valerie --

The database is on the order of several gigabytes, so it is hosted on our university servers. The package's primary user-facing function get_multimir() builds and submits SQL queries based on function arguments.

Was your initial suggestion to add our own S4 class and select(), keys(), ketypes(), etc. functions for that class? Rather than implementing the entire AnnotationDb class for our package and database (of databases)? That would clear things up :)

In terms of compatibility, I just mean that our users treated the returned object as a list, using the $, [], [[]] operators to access the information within the returned object. Whereas if we return an S4 object existing code would have to replace those with the @ operator. This is a minor change, but in the refactoring we initially tried to maintain the existing user interface.

Adding the legacy.output option, would allow them to make one change to the function call that would maintain compatibility for the rest of the code. As far as formal S3 methods, there is only one for printing.

Thanks -- Matt

mmulvahill commented 7 years ago

Hi @vobencha -- I'm jready to submit an updated version incorporating the changes you've requested. Can you reopen the issue? Thanks!

vobencha commented 7 years ago

Hi,

Was your original object truly a list or was it a database that acted like a list? I don't understand, and am not sure I see the need for, the legacy.output option if both list-like accessors and the select() family of function work on the object.

Valerie

mmulvahill commented 7 years ago

Hi @vobencha --

I've taken care of the first 3 bullet points in your last post (not an AnnoDbi object, importing the generic methods, and returning the same types).

On the legacy.output option -- this was the compromise our group came to so as to not break existing code. Since the previous return object was truly a list, the $ operator worked and was used extensively in our examples and vignette. We know current users used the $ operator to access the datasets returned by get_multimir(), and the $ operator will not work with the new S4 object that the function returns. If necessary for acceptance, I suspect the group will be willing to remove the compatibility option for old code and I will run it by them tomorrow. If not necessary, our preference is to leave it in. I completely agree that we should encourage/guide users to the AnnotationDbi select() and related methods.

Also, it seems like the issue bot is not tracking this issue, nor my recent commits (I did increment the version # and my webhook shows recent pushes to this repo). Any suggestions on fixing that?

Thanks!

vobencha commented 7 years ago

Hi, It's ok to leave the legacy.output option in for this iteration. I would encourage you to change that in the future by allowing list-like operations on the mmquery_bioc object. Specifically, you can implement '[[', '[', '$' methods on the mmquery_bioc object. Other than the name and a show method, the switch should not be visible to the user at all. Their old operations will still work but the documentation will encourage them to start using the new select(), keys() etc.

I noticed your README discusses installing from Bioconductor, github, both devel and release versions. Once a package is accepted to Bioconductor it appears only in the devel repo http://www.bioconductor.org/checkResults/devel/bioc-LATEST/ until the next release at which time it's available in both release and devel. Here is the release link: http://www.bioconductor.org/checkResults/release/bioc-LATEST/ We encourage installation with biocLite() only because this helper installs all dependencies listed in the NAMESPACE. Instructing the user to several different versions, I think, is confusing. The package will be versioned with an even 'y' in release and odd 'y' in devel. We control this by bumping the 'y' portion of the version at each release. For example, current version is 0.99.7. It will stay 0.99. until the release on Oct 31. After the release the version in release will be 1.0. and in devel it will be 1.1.*. My point being is that the release/devel issue is handled through Bioconductor versioning. Users will know which is which by the landing pages. Here is an example of a release and devel landing page for the a4 package. Note at the top of the devel landing page there is a sentence under the "DOI" that points the user to the release version: http://www.bioconductor.org/packages/3.5/bioc/html/a4.html http://www.bioconductor.org/packages/3.6/bioc/html/a4.html

Please tidy the README to instruct loading with just biocLite(), i.e., remove all this:

# Now install devtools (for installing from GitHub repositories)
install.packages("devtools")
library(devtools)

# Now install the development version of the multiMiR package
#devtools::install_github("kechrislab/multimir", build_vignettes = TRUE)

# To install a public release, use the `ref=` option to select the tag of the
# release version 
# The current public release is v0.99.5
devtools::install_github("kechrislab/multimir", ref = "v0.99.5")

Not sure what's going on with the bot. @lshep can you please kick off a new build for this issue?

Valerie

mmulvahill commented 7 years ago

That sounds like a good plan to me. Since they're not typical functions/methods, I hadn't thought of implementing the list/df accessor methods you mention for our mmquery_bioc object. Definitely a good idea.

I've removed from our readme any mention of installing from github. I expect it would be okay to continue to do incremental development (partial bug fixes, development of feature additions) on the github repo and push to the Bioconductor SVN devel repo with complete added features and fixed bugs?

Thanks!

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/multiMiR_buildreport_20171004102628.html

vobencha commented 7 years ago

Yes, we expect that development will continue in your github repo. There are some good workflow / FAQ docs in this repo you may want to look at: https://github.com/Bioconductor/bioc_git_transition/tree/master/doc

Package is approved. Thanks for your work on this. Valerie

bioc-issue-bot commented 7 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be sent to the maintainer email address in the next several days.

Thank you for contributing to Bioconductor!

mmulvahill commented 7 years ago

Thanks for your prompt reviews and replies!

mtmorgan commented 7 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/mmulvahill.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"multiMiR\"). The package 'landing page' will be created at

https://bioconductor.org/packages/multiMiR

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.