Refactor resource location

giomasce commented 11 years ago

Sometimes CMS has to locate some external resource (usually an external file, like the configuration, isolate, *WS templates and similar things). Search paths for these things may vary and may depend on whether CMS is running from the repository or installed.

Right now there are various different location algorithms in different parts of the code, more or less ad-hoc for one or another resource. At least some of them are buggy (it is known that, for example, if you run installed CMS from the repository directory, it will try to use isolate from the repository; this usually lead to bugs, because if you use the canonical setup.py script for compilation, the isolate in the repository has not suid bit set).

I want to make a survey of the status quo at the moment and understand whether there is a way to make everything a bit more rational and less buggy.

lw commented 11 years ago

Here are my thoughts on this issue.

isolate is, in my opinion, a separate and independent software. The only "integration" is CMS calling it via command-line, formatting the arguments, and parsing the result. There's no API, it's not a Python module, package or extension. isolate is a useful software on its own, and users may want to use it independently from CMS, but at the moment they have to install both. This is also reflected in our Debian packaging: isolate is in its own package, and not in cms.

I think we should consider and discuss the option of splitting isolate out of CMS and hosting it in a different repo (still belonging to cms-dev). CMS will then assume that isolate is in the PATH, and will also allow to set the explicit path to the executable in the configuration (I think this is the most common approach when a software needs to use an external binary executable...).

As for the other files we should check if there's some standard way to handle them. distutils and setuptools provide one [1] for specifying these files, i.e. the package_data dict of setup.py, and we're already using that. Yet, I don't think they provide a way to discover, later, where these files are installed, or even guidelines to how to access them (e.g. assume they're always in a certain location relative to the source file). We should investigate in that direction. One exception to this are translations. At the moment we're manually installing them to /usr/local/share/locale.

Another thing I don't like is always forcing to use the "local" location (i.e. /usr/local, /var/local, etc.). I understand that on Debian and Ubuntu it's common to put manually-installed packages there, and that therefore the default "prefix" is /usr/local rather than /usr. But on other distributions this doesn't happen and, in fact, their default prefix is /usr. Since distutils somehow manages to read these settings and uses them for the files it installs, this results in files being split between the two locations.

I don't know how Python manages it, but for C/C++ software these prefix values are set during pre-build configuration, saved in a header that's then built together with the source files. I don't know if we can do something similar with Python (i.e. set them at installation, and retrieve them later). This also hold for the locations of temporary files, logs, cache, configuration, etc.

If distutils and setuptools don't satisfy us we can look at other packaging tools. GNOME uses autotools for its Python packages. It may be worth a look.

[1] http://docs.python.org/2/distutils/sourcedist.html#manifest

lw commented 11 years ago

Another thought: do we really need to keep the non-installed mode? Who do we do it for? The lazy developer that doesn't want to reinstall every time he does a change, or the user that doesn't have root access to the server he wants to run CMS on? Because for the latter case, virtualenv [1] seem the proper solution (provided that we're flexible enough when installing and retrieving our external file resources).

[1] http://www.virtualenv.org/en/latest/

stefano-maggiolo commented 11 years ago

Also for automated testing it is probably more complicated to install first and execute the test later (without mentioning the fact that we don't install the tests).

On 12 June 2013 14:12, Luca Wehrstedt notifications@github.com wrote:

Another thought: do we really need to keep the non-installed mode? Who do we do it for? The lazy developer that doesn't want to reinstall every time he does a change, or the user that doesn't have root access to the server he wants to run CMS on? Because for the latter case, virtualenv [1] seem the proper solution (provided that we're flexible enough when installing and retrieving our external file resources).

[1] http://www.virtualenv.org/en/latest/

— Reply to this email directly or view it on GitHubhttps://github.com/cms-dev/cms/issues/161#issuecomment-19324297 .

giomasce commented 11 years ago

If you want to split isolate to a new repository, then I think that Sandbox.py must follow it and be developed as a general Python module, that is then used by CMS. So in CMS we just have to worry about importing Sandbox, which is something that Python already handles. Of course this new repository would have to face the same problem.

OTOH, this splitting is something that I don't really care much about and at the moment I would prefer put energies into more interesting and useful things.

I agree that we should make the installation prefix configurable. And I would like to retain the current non-installed running mode, because I like not to pollute my system with files that are not managed by my package manager. Moreover, it's a common pattern, found in many other pieces of software, and it's useful for who wants to try CMS without having root access on his machine.

lw commented 11 years ago

I don't agree. As our current repository filesystem structure suggests, isolate is the small independent command-line program written in C whereas Sandbox.py is the part of CMS used to interface with that utility. If we want to split them this is how it should be done in my opinion. Yet I think there's an even better solution: isolate should become (or at least provide) a Python extension, that is a Python module written in C [1]. Advantages: no need for subprocess, easier communication and configuration, exception handling, no formatting and parsing (just pass the values as Python objects) and, notably, the possibility to use distutil's builtin capabilities for compilation and installation! Disadvantages: no independent command-line tool (except if we split it into a library with two frontends: the Python extension and the binary executable) and, unfortunatly, the impossibility to require the SUID flag.

What "other pieces of software" are you referring to? Can you make some examples? Are they C/C++ code or Python? In the latter case how do they handle configuration(, compilation) and installation? Do they use distutils? Do they have locale data to install? Where do they get the paths for config/log/cache/tmp/run/lib files?

[1] http://docs.python.org/extending/ and http://docs.python.org/c-api/

lw commented 11 years ago

(BTW, those who want to run CMS without having root access to their machine will probably not have all the required dependencies and, unless they have a collaborative sysadmin, they'll have to install them in their personal directories... in that case I think that using virtualenv makes it easier to manage the dependencies and CMS itself than fiddling with PYTHONPATH)

giomasce commented 11 years ago

Ok, then my proposal is to consider Sandbox.py as a separate piece of software the CMS uses as dependency. What is in use now is not necessarily meaningful, since we're discussion about how things could change.

Making isolate a Python extension isn't really what I would call a "better solution", since it would require a lot of work for (what I consider) no real advantage.

OTOH, I don't actually have a strong opinion about how to handle all these things. So far, my main worry is that CMS runs correctly whan one launches it. If someone wants to work out a more idiomatic and Pythonic way to do things, I'm not against it, but I don't think I'll spend time in it at the moment.

About running from the repository: yes, many things can be done. Still, in my opinion, being able to just clone a repository, install dependencies and run a command to fire CMS is better than having to worry about virtualenv, installation of Python packages and so on.

lw commented 11 years ago

I repeat the questions:

What "other pieces of software" are you referring to? Can you make some examples? Are they C/C++ code or Python? In the latter case how do they handle configuration(, compilation) and installation? Do they use distutils? Do they have locale data to install? Where do they get the paths for config/log/cache/tmp/run/lib files?

giomasce commented 11 years ago

I don't have any specific example in mind, it's just the situation I always found when trying to compile a project from scratches. I don't remember of ever having had to install it to use it. Probably they were mostly C projects based on autotools. Surely there also were some Java projects, with some funny build system from the Java ecosystem.

The handling of runtime resources was performed in many different ways, letting the build system write somewhere the prefix path and searching it. Or detecting the in-repository run and behaving accordingly.

cms-dev / cms

Refactor resource location #161