KechrisLab / multiMiR

Development repository for the multiMiR database's R API
Other
19 stars 3 forks source link

Investigate options for passing URLs used throughout package functions #2

Closed mmulvahill closed 7 years ago

mmulvahill commented 7 years ago

Currently uses a variable local to the package environment. Using getOption to access and options("" = ...) to set may be advantageous in that it:

  1. formalizes the URLs as key package options
  2. allows the user to update the URL (universally across the R session) without requiring a new package release
  3. prevents confusion over local vs package environment variable
# 2. 
options("multimir_URL" = "https://new-mm-url.edu")
multimir_dbInfo()  # Still works without passing the new URL each time
# 3.
multimir_URL  # Variable found in package environment
# [1] "http://multimir.ucdenver.edu/cgi-bin/multimir.pl"
multimir_URL <- "test"  # User attempt to update value places it in local environment
multimir_URL      # User/global environment
# [1] "test"
multiMiR::multimir_URL # But actual value used in functions (package env) is unchanged
# [1] "http://multimir.ucdenver.edu/cgi-bin/multimir.pl"
>
mmulvahill commented 7 years ago

@smahaffey What are your thoughts on this? I've implemented it, but can revert back to the variable approach if you prefer it.

mmulvahill commented 7 years ago

Reopen for discussion w/ @smahaffey

smahaffey commented 7 years ago

Good idea. I think that makes sense. Although I will point out for versioning the database we are changing the URL for 2.1 while keeping the URL for v1.x and v2.0 so that if people want to continue with work they started on that version they won't get different results now with the next update.

We could fix this by adding a parameter to the CGI script that takes the database name that we can version. It could default to v1.x/2.0 so that it wouldn't break old versions.

mmulvahill commented 7 years ago

So that brings up a tricky issue... Bioconductor requires (I believe strictly so) new packages be versioned 0.99.0, and then increments to 1.0.0 upon acceptance.

So we wil likely need to maintain separate DB and R Package version #s.

How does versioning work on the DB and URL?

smahaffey commented 7 years ago

Here's what I know about the versions since I was using it but not really involved with the development until now.

For the Package: Ver 1.0 and 1.0.1 were released initially with human/mouse. Kevin also created v2.0 to add Rat data which we needed for our site. The database wasn’t updated other than adding Rat. He wanted to do more testing and then never released 2.0 publicly. Instead of updating 1.0.1 without Rat support I updated 2.0 to fix the problem with submitting larger lists. For that I left the URL the same because the database(multimir) reflects v2.0. Maybe the next release should be 3.0 and version 1.0 for bioconductor instead of using 2.1.

For the URL: To make it simple I was going to just add a version number on the script file name in the URL to change the database since the database is hard coded in the script.

http://multimir.ucdenver.edu/cgi-bin/multimir2.1.pl vs http://multimir.ucdenver.edu/cgi-bin/multimir.pl

However I think we should just add a parameter so we could keep the URL the same between versions and specify the database version in the request. I think we could make it default to the old version for backwards compatibility with v2.0.1 and earlier.

For the database: The database versions will just be a different database name for example right now the database is multimir but the new version will be multimir_2_1 although we can change the number to match the version we decide to go with.

The versioning requirement for Bioconductor is unfortunate, but we can note the change in version numbers on the website for people that download it directly.

Given that what do you think about:

  1. Version the database and R package separately as you suggested
  2. Add a database version parameter to the CGI script, so we can keep the URL the same across versions
  3. Build in a function to retrieve database versions from the database so users can view the options from the R package. There already is a function to summarize counts from each source database and provide the version information for each of them. So users can connect to each version to see when different database updates where included if they need something specific like a recent request for updating with mirBase v21 IDs.
mmulvahill commented 7 years ago

This sounds good. Making the R package as agnostic to DB version as possible should reduce ongoing maintenance requirements.

  1. Yes, for sure
  2. and 3. Sounds like a reasonable approach. Basically we'd add a multimir_dbVersion() function that returns DB version numbers and associated details, meta-info one level higher than the existing multimir_dbInfo(). We'd then add a package option and a function argument (to multimir_dbInfo) for version #:
multimir_dbVersion(url = getOption("multimir_url"))
multimir_dbInfo(url = getOption("mulitmir_url"), 
                version = getOption("multimir_dbversion"))

Does this match what you were thinking?

I want to keep usage as intuitive as possible and avoid explicitly requiring the user to consider layers of metainformation or other similar hierarchies, so let me know if you think of any other situations like this.

smahaffey commented 7 years ago

Yes that's exactly what I was thinking. I can work on implementing the server side, while I wait for some of the scripts to ru. I think to make it as easy as possible to maintain the server side we probably should have 2 URLs the current one that will support the versions prior to bioconductor and then a new one to support future versions. There are two options so we can avoid maintaining the perl script in the future.

  1. I'm thinking it would be good to make the new script default to the current version(something like multimir_current) if no versions are specified to make it faster and easier without changing code and a database with a table listing previous versions. Updates to the database will just copy the database to a new name and replace the current version.
  2. The other option is with each request without a specified version to query the database to determine the current version. This wouldn't add much time but it would add a little.

Then add a database (multimir_versions) with a table(versions) which lists the name, version, creation date and if it is public yet. I think this should effectively decouple the database, perl script, and R package.

I think at this point we can just eliminate the 2.1 branch it only changed the URL which will change anyway.

mmulvahill commented 7 years ago

My thought was to have the _dbVersion() function execute when loading the package in R -- the load/welcome message would then display the current version, show the default values of other package options, and give provide brief instruction on how to change the version.

For reproducibility sake, we want the user to be able to specify the version at the beginning of their code and have that code execute successfully no matter if the current version has changed -- would this be affected by having two separate URLs?

I'm thinking we should meet in person and talk through these issues. That sound like a good idea? If so, when is a good time for you?

smahaffey commented 7 years ago

Yeah let's meet and talk about it it. I think we just need to add a URL to support the new version and leave the current one alone for older versions that people have downloaded and installed from source. I will email you about times.

smahaffey commented 7 years ago

Also since I'm testing the database now I realized we need to have updated .rda files with the cutoff values for each DB in it. So however we set the version needs to get the file name from the DB and set it as an option that's used in the method to get the cuttoff.rda file. Just a note to remind me to account for that.

Let me know when you can meet to discuss it. I'm just about finished with testing the DB v2.1 so I can then modify the server side stuff and help with the R side so it will work with the modified server.

mmulvahill commented 7 years ago

To summarise our discussions today:

smahaffey commented 7 years ago

I have completed the changes needed to allow using previous or future database versions from one version of the package:

  1. I have added multimir_dbInfoVersions() which gets the version list along with pertinent info.

  2. I have added a method to switch the database version being used. It will display an error message if the switch is unsuccessful. Likely due to wrong version string or server availability.

  3. I've changed the initial setup to request a list of versions and setup to use the most current version. Welcome message displays info about version/updated date or error message if server is unavailable at the moment.

  4. Options for file names for the db schema and the rda file will be loaded from the DB since they are tied to the DB version.

@mmulvahill - I don't have permission to push the changes to GitHub. Also I can't actually get it to build as a package now so I can't fully test it as a package, but testing the methods seems to work. I do not understand the issue I get an error about doc/index.html not existing if I build or try to tar/gzip and install it directly. Any ideas on fixing this error so I can finish testing my changes.

smahaffey commented 7 years ago

I added a parameter for the list of tables. MySQL seems really slow listing tables and this gets called repeatedly. This should speed everything up.

mmulvahill commented 7 years ago

@smahaffey Thanks for implementing the db version change feature -- It looks great.

Quick FYI -- I'm currently about halfway through refactoring the package to reduce duplicate code, maintainence overhead, and make the pkg structure more consistent with 'best practices'. As a part of this, I'm deprecating all function arguments that are set by our new global options (mostly url). The idea being that the user should be able to change structural options for accessing the webserver in a consistent way, but it should be hidden for typical usage.