FredHutch / Oncoscape

a web application to apply/develop analysis tools for Molecular and Clinical data
https://oncoscape.sttrcancer.org
MIT License
178 stars 45 forks source link

Refactoring the Oncoscape Core #164

Open pshannon-bioc opened 8 years ago

pshannon-bioc commented 8 years ago

Refactoring the oncoscape core (14 dec 2015)

These notes propose some simple changes to Oncoscape in order to create a more flexible server. The principal change is the use of the Factory Pattern so that data and analyses can be made available to the server in many ways, ranging from directly in-process (as we do now) to remote, shared and secured in a service-oriented architecture (broadly understood) as is the clear need. The creation and provision of more sophisticated data and analysis services is not specified here. Instead, simple refactoring of the Oncoscape server is described which will support open-ended forms of distributed and secure computation in the future.

current constructor:

onco <- OncoDev14(port=port, scriptDir=scriptDir, userID=userID, datasetNames=current.datasets)

new form

app <- Oncoscape(port, analysisPackages, datasets, browserFile, userCredentials)

 analysisPackages: a list of R package names, each of which is derived from
                   the SttrAnalysisPackage base class
 datasets: a list of R package names, each of which is derived from the
           the SttrDataPackage base class

 browserFile: name of a file combining HTML, CSS and Javascript
 userCredentials: an instance of the UserCredentials class (or a subclass)

Three abstract base classes are needed:

   SttrDataPackage: need add open-ended support for indirect data (local database, remote
      database, cloud, etc.)
   SttrAnalaysisPackage: provides template and some shared methods for, e.g., PCA, PLSR, and
      future additions
   UserCredentials:  open-ended design, from simple userID and no password, to LDAP, AD, and etc.

Both SttrDataPackage and SttrAnalysisPackage follow the loose definition of SOA, service oriented architecture (https://en.wikipedia.org/wiki/Service-oriented_architecture):

   "a component that is encapsulated behind an interface"

The PCA analysis package behaves like this (these calls are made by the Oncoscape server)

   packge.name <- "PCA"    # or "PCA.SOA.AmazonS3" or ...
   library(package.name)   # load the code, which may be self-contained, or a facade to a
                           # adaptive distributed system deployed in the cloud, or ...
                           # crucial: the server has no idea how the PCA calculations are actually
                           # performed, nor where the data actually is

   eval(parse(text=sprintf("pkg <- %s(server)", package.name)))
   register(pkg)           # the pkg tells the server the websocket messages it wants to receive

   the server provides data and message passing services to the pkg.

Two app constructor examples to demonstrate the spectrum of uses:

1) reproduce current style of use:

    app <- Oncoscape(7001, c("PCA", "PLSR"), c("DEMOdz", "TCGAbrain"),
                      "index.html", "demo@nowhere.org");

2) demonstrate distributed shared data, analysis, high security.  "

    app <- Oncoscape(7001, c("PCA.SOA.AmazonS3", "PLSR.immediate"),
                     c("DEMOdz.immediate", "TCGAbrain.Amazon"), 
                     "index.html",
                     "HutchPHI")

    note that the actual values of the the user's credentials is deferred to 
    an as-yet unspecified but arbitrarily complex, arbitrarily secure class.

Data and analysis packages, and credentials, all depend upon the Factory design
pattern, in which a character strings are passed to the appropriate factory,
which returns a (possibly intricate, possibly simple) object of the appropriate
derived class.  Each of these concrete objects (an SttrDataPackage, an SttrAnalysisPackage,
a UserCredential instance) supports the methods of their base class, so each
can be used in Oncoscape interchangeably.  

For example, imagine the use of a private BRCA data set stored with many layers of security
on the Amazon cloud.

  app <- Oncoscape(7001, 
                   c("TCGAbrca.SOA.AmazonS3", "BRCA4013.SOA.AmazonS3.PHIlevel.10"),
                   c("PCA.immediate", "HOBO.hutchCluster"), 
                   "index.html",
                   "HutchCredentials.PHI.level.10")

 As the app starts up:
    1) the specified credentials object is created, and the user must establish
           a) she has a secure connection
           b) she is authorized 
    2) TCGAbrca.SOA.AmazonS3 is created, needs no credentials (or maybe just enough
       for billing purposes)
    3) BRCA4013.SOA.AmazonS3.PHIlevel.10 is created.  the high security credentials
       from step 1 must be supplied
    4) a PCA package is loaded and initialized; it runs in-process with Oncoscape
    5) a hobo similarity calculator, peruaps already running on the hutch cluster,
       is contacted.  maybe credentials are needed, if only to track which lab
       is using the cluster.
canaantt commented 8 years ago

@grettygoose @canaantt need to learn from Paul's design and work through all the remaining datasets.