Minerva analysis refactor

kotfic commented 8 years ago

This PR implements a new approach to analyses in minerva.

Overview

analyses are implemented as a new girder resource at the minvera_analysis endpoint. These are in turn stored in a mongodb collection. The items in the collection model the name, type (e.g. python, R, etc) and path to an analysis file. In this PR only the 'python' type is implemented.

A python analysis is a python file that contains a function run(...) The run function is the interface to the analysis and its arguments and certain aspects of its documentation are exposed at the minerva_analysis/{name}/meta endpoint.

Example

Consider the following analysis:

{ 
   "name":  "sum",
   "type": "python",
   "path": "/tmp/sum.py"
}

Where /tmp/sum.py contains the following code:

def run(a, b):
    """Sum a and b argumetns
    :type a: int
    :param a: the first number to sum
    :type b: int
    :param b: the second number to sum
    """
    return sum([a, b])

Calling GET on /minerva_analysis/sum/meta will return:

[{'kwarg': False,
  'name': 'a',
  'optional': False,
  'type': 'int',
  'description': 'the first number to sum',
  'vararg': False},
 {'kwarg': False,
  'name': 'b',
  'optional': False,
  'type': 'int',
  'description': 'the second number to sum',
  'vararg': False}]

This interface can be used to query information from the user who in turn POSTs to /minerva_analysis/sum/ with a json body containing:

{ 
  "a": 2,
  "b": 2
}

Which will return the json value 4

Technical details

This process is achieved using a python abstract syntax tree parser to pull information out of the interface function (e.g. run) and a docutils SparseNodeVisitor to pull param and type descriptions out of the documentation. This allows the script to be the single source of truth for argument information and documentation but means that type values should not be relied on to be from a particular set of enumerated values. Python analyses are implemented as a objects of type PythonAnalysis defined in server/utility/analysis.py Python analyses are created with the dictionary returned from loading the girder python analysis or through the convenience function get_analysis_obj e.g.:

name = 'sum'
model =  self.model('analysis', 'minerva').get_by_name(name)
analysis = get_analysis_obj(model)

The analysis variable now exposes its inputs through the analysis.inputs property. and can be run with analysis.run_analysis(args, kwargs, opts) where args are the positional arguments to the analysis run function, kwargs are the keyword arguments, and opts is a dictionary of options for the style in which the run should be executed (Note: opts is not implemented in this PR, only stubbed out). PythonAnalysis also implements a call method as a convenience causing it to work the same way as the run function would work - as though it had been imported. e.g.:

name = 'sum'
model =  self.model('analysis', 'minerva').get_by_name(name)
analysis = get_analysis_obj(model)

c = analysis(2, 2)  # c will now equal 4

Notes

All analyses are currently run synchronously server side. We will need to implemenent distributed task queuing in a separate PR
The previous analysis_test.py has been moved into bsve_analysis_test.py, currently tests for the PythonAnalysis class are in analysis_test.py and test for the REST interface are in analysis_rest_test.py
To ensure the flexibility of the system i have implemented a girder worker proof of concept in server/utility/girder_worker_analysis.py This implements the class GirderWorkerPythonAnalysis and exposes a spec property. E.g.:

    sum_code = """
def run(a, b):
    \"""Sum a list of numbers passed to run

    :type a: number
    :format a: number
    :type b: number
    :format b: number
    :return type c: number
    :return format c: number

    \"""
    return sum([a, b])
"""
    # Write the script to disk
    with open("/tmp/sum.py", "wb") as fh:
        fh.write(sum_code)

    # Create an analysis from
    p = GirderWorkerPythonAnalysis(name='sum', path='/tmp/sum.py')
    print(p.spec)

Print produces

    {'inputs': [{'format': 'number', 'id': 'a', 'type': 'number'},
                {'format': 'number', 'id': 'b', 'type': 'number'}],
     'mode': 'python',
     'output': [{'format': 'number', 'id': 'c', 'type': 'number'}],
     'script': "\nimport imp\nfp, pathname, desc = imp.find_module('sum', ['/tmp'])\nmodule = imp.load_module('sum', fp, pathname, desc)\nc = module.run(a, b)\n"}

GirderWorkerPythonAnalysis does not currently implement the run_analysis function but could be easily extended to provide this functionality.

kotfic commented 8 years ago

@mgrauer @aashish24 PTAL

Tests appear to be failing because http://demo.geonode.org/geoserver/wms is throwing a proxy error?

aashish24 commented 8 years ago

this looks great @kotfic will look into the details.

kotfic commented 8 years ago

See also this gist: https://gist.github.com/kotfic/6e6e34d297479fa8ea62790c1c9ce36a for an interactive example of the PythonAnalysis and GirderWorkerPythonAnalysis classes at work.

aashish24 commented 8 years ago

@kotfic thanks. We will review the branch and get back to you if we have any more questions.

aashish24 commented 8 years ago

I am closing this one for now.

Kitware / minerva