Kitware / tangelo

A simple, quick, powerful web framework
http:/tangelohub.org/tangelo/
Apache License 2.0
185 stars 35 forks source link

Allow services to specify extra paths to place in sys.path #106

Closed waxlamp closed 11 years ago

waxlamp commented 11 years ago

The problem arises when a service writer wishes to place common utilities in a separate, non-service python module for use by multiple services. If the services wishing to use such a utility module lie in different directories than the utility, it becomes ill-defined how the user should modify sys.path to account for the utility's location. Here's an excerpt from an email I wrote earlier about solving this problem:

'Here are the solution strategies I think might work for the long term:

  1. At the time the Tangelo engine loads your python module, it should also temporarily prepend the directory it lives in to the system path. This would solve your problem permanently, since the service directory containing materializedPaths.py would transparently be part of sys.path when phylomap.py is loaded. This is a relatively simple change to tangelo.py.
  2. I can imagine a generalized version of this issue where you want to keep a cache of widely useful python modules somewhere, and import them into various services, not all of which are in the same place. The solution above wouldn't work for this situation, and it's difficult (or maybe impossible) to ask service writers to simply amend sys.path themselves, since they don't know in general how to form absolute system paths to the files, nor do they know the base for forming relative paths. In this case, I can imagine an optional function you would write (alongside the run() function), called "paths()", that would return a list of paths relative to the file's service directory that should be temporarily prepended to the system path. Here's an example of what I mean:

~roni/lib/python/ contains a few python files, like util.py, graph.py, etc. These are NOT services, but just utility files.

~roni/app/foo/service/ contains a python service bar.py, which needs to use graph.py. bar.py looks like this:

def paths(): return ["~roni/lib/python"]

def run(...): . . .

Then, the Tangelo engine can first run paths() to get a list of extra paths to stick in front of sys.path, then run run() to carry out the service, and finally remove the paths from sys.path. The removal step is to ensure that sys.path doesn't get "polluted" with paths that might affect how future services import their dependent modules, and also so that sys.path doesn't continuously get longer and longer, causing possible performance issues for a long-running server.

If for some reason you want to suppress #1 above from automatically including the service directory in sys.path, the protocol could be to have the paths() function return None.'

Strategy 1 has been implemented in commit d1679882fd13004495a9cc51591f4f0358519d62

jeffbaumes commented 11 years ago

This seems very nice and dynamic. At the same time this can be potentially slow, since this would all be happening on each request. I can't think of any better solution for now though so I'm good with going with this one.

waxlamp commented 11 years ago

HTTP authentication has a similar on-each-request overhead (as well as possible future features, like not requiring services to live in a "service" directory - suggested by Chris H.). It's been on my mind that we need to do some profiling to measure the impact of each of these features. For that reason I've been adding command line switches to optionally disable these features one by one (so we can carry out the profiling).

waxlamp commented 11 years ago

Oops, this approach doesn't work. It requires importing the module before calling its paths() function, but the import will fail if you have paths you need to augment sys.path with... and of course those paths are tucked away inside the paths() function :). We'll have to come up with a different mechanism to allow this.

waxlamp commented 11 years ago

Closed in commit 7c14138f7ef4cc7bdd4e24bb19d3464b08f8743c.

Instead of the flawed approach I originally suggested, this does something slightly hacky, but maybe good enough. I have added a tangelo.paths() function which takes a single list of module-relative paths with which to augment sys.path. Now the service writer can do something like:

import tangelo
tangelo.paths(["../libs", "extra"])

import lib1
import lib2
import extra1
.
.
.

where lib1 and lib2 are modules in ../libs, and extra is a module in extra (all paths relative to wherever the service module is).

The hackiness comes because of the way the engine has to prepare tangelo.paths(), using a global variable and a special function that places the module's directory in it (tangeo.py, line 34, and tangelo.in, line 18). This global variable is a bit of a liability, but there are ways to protect it from unauthorized tampering. Specifically, here is a crazy Python hack (sanctioned by Guido himself): http://stackoverflow.com/questions/2447353/getattr-on-a-module (the answer given by Ethan Furman, which references this email from Guido on the subject: http://mail.python.org/pipermail/python-ideas/2012-May/014969.html). I'm not too worried about this particular thing, but if we needed to, we could implement the hack (which I tested - it does work as advertised).

In any case, now there's a way to affect sys.path from inside a service module.

jeffbaumes commented 11 years ago

The second solution looks more explicit to me anyway. I like how it feels more like an import statement rather than a magical callback that will be called sometime.