= The Elixir Cross Referencer :doctype: book :pp: {plus}{plus} :toc: :toc-placement!:
Elixir is a source code cross-referencer inspired by https://en.wikipedia.org/wiki/LXR_Cross_Referencer[LXR]. It's written in Python and its main purpose is to index every release of a C or C{pp} project (like the Linux kernel) while keeping a minimal footprint.
It uses Git as a source-code file store and Berkeley DB for cross-reference data. Internally, it indexes Git blobs rather than trees of files to avoid duplicating work and data. It has a straightforward data structure (reminiscent of older LXR releases) to keep queries simple and fast.
You can see it in action on https://elixir.bootlin.com/
link:CHANGELOG.adoc[Changelog]
toc::[]
= Requirements
mod_wsgi
(for the REST API)= Architecture
The shell script (script.sh
) is the lower layer and provides commands
to interact with Git and other Unix utilities. The Python commands use
the shell script's services to provide access to the annotated source
code and identifier lists (query.py
) or to create and update the
databases (update.py
). Finally, the web interface (web.py
) and
uses the query interface to generate HTML pages and to answer REST
queries, respectively.
When installing the system, you should test each layer manually and make sure it works correctly before moving on to the next one.
= Manual Installation
== Install Dependencies
For Debian
== Download Elixir Project
== Create a virtualenv for Elixir
== Create directories for project data
== Set environment variables
Two environment variables are used to tell Elixir where to find the project's local git repository and its databases:
LXR_REPO_DIR
(the git repository directory for your project)LXR_DATA_DIR
(the database directory for your project)Now open /etc/profile
and append the following content.
And then run source /etc/profile
.
== Clone Kernel source code
First clone the master tree released by Linus Torvalds:
Then, you should also declare a stable
remote branch corresponding to the stable
tree, to get all release updates:
Then, you can also declare an history
remote branch corresponding to the old Linux versions not present in the other repos, to get all the old version still available:
Feel free to add more remote branches in this way, as Elixir will consider tags from all remote branches.
== First Test
== Create Database
Generating the full database can take a long time: it takes about 15 hours on a Xeon E3-1245 v5 to index 1800 tags in the Linux kernel. For that reason, you may want to tweak the script (for example, by limiting the number of tags with a "head") in order to test the update and query commands. You can even create a new Git repository and just create one tag instead of using the official kernel repository which is very large.
== Second Test
Verify that the queries work:
$ ./elixir/query.py v4.10 ident raw_spin_unlock_irq C $ ./elixir/query.py v4.10 file /kernel/sched/clock.c
NOTE: v4.10
can be replaced with any other tag.
NOTE: Don't forget to activate the virtual environment!
== Configure httpd
The CGI interface (web.py
) is meant to be called from your web
server. Since it includes support for indexing multiple projects,
it expects a different variable (LXR_PROJ_DIR
) which points to a
directory with a specific structure:
<LXR_PROJ_DIR>
<project 1>
** data
repo
<project 2>
** data
repo
<project 3>
** data
repo
It will then generate the other two variables upon calling the query command.
Now replace /etc/apache2/sites-enabled/000-default.conf
with docker/000-default.conf
.
Note: If using httpd (RedHat/Centos) instead of apache2 (Ubuntu/Debian),
the default config file to edit is /etc/httpd/conf.d/elixir.conf
.
Finally, start the httpd server.
== Configure SELinux policy
When running systemd with SELinux enabled, httpd server can only visit limited directories. If your /path/elixir-data/ is not one of these allowed directories, you will be responded with 500 status code.
== Configure systemd log directory
== Configuration for other servers
Other HTTP servers (like nginx or lighthttpd) may not support WSGI and may require a separate WSGI server, like uWSGI.
Information about how to configure uWSGI with Lighthttpd can be found here: https://redmine.lighttpd.net/projects/lighttpd/wiki/HowToPythonWSGI#Python-WSGI-apps-via-uwsgi-SCGI-FastCGI-or-HTTP-using-the-uWSGI-server
Pull requests with example uWSGI configuration for Elixir are welcome.
= REST API usage
After configuring httpd, you can test the API usage:
== ident query
Send a get request to /api/ident/<Project>/<Ident>?version=<version>&family=<family>
.
For example:
curl http://127.0.0.1/api/ident/barebox/cdev?version=latest&family=C
The response body is of the following structure:
= Maintenance and enhancements
== Using a cache to improve performance
At Bootlin, we're using the https://varnish-cache.org/[Varnish http cache] as a front-end to reduce the load on the server running the Elixir code.
.-------------. .---------------. .-----------------------. | Http client | --------> | Varnish cache | --------> | Apache running Elixir | '-------------' '---------------' '-----------------------'
== Keeping Elixir databases up to date
To keep your Elixir databases up to date and index new versions that are released,
we're proposing to use a script like utils/update-elixir-data
which is called
through a daily cron job.
You can set $ELIXIR_THREADS
if you want to change the number of threads used by
update.py for indexing (by default the number of CPUs on your system).
== Keeping git repository disk usage under control
As you keep updating your git repositories, you may notice that some can become
considerably bigger than they originally were. This seems to happen when a gc.log
file appears in a big repository, apparently causing git's garbage collector (git gc
)
to fail, and therefore causing the repository to consume disk space at a fast
pace every time new objects are fetched.
When this happens, you can save disk space by packing git directories as follows:
Actually, a second pass with the above commands will save even more space.
To process multiple git repositories in a loop, you may use the
utils/pack-repositories
that we are providing, run from the directory
where all repositories are found.
= Building Docker images
Dockerfiles are provided in the docker/
directory.
To build the image, run the following commands:
git rev-parse --short HEAD
./elixir/Dockerfile ./elixirELIXIR_VER build argument is optional. Since .git directory is not copied into Docker image by default, the option is used to pass a version string to Elixir.
You can then run the image using docker run
.
Here we mount a host directory as Elixir data:
The Docker image does not contain any repositories.
To index a repository, you can use the index-repository
script.
For example, to add the https://musl.libc.org/[musl] repository, run:
/bin/bash -c 'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'
Without PYTHONUNBUFFERED environment variable, update logs may show up with a delay.
Or, to run indexing in a separate container:
--entrypoint /bin/bash elixir -c \
'export "PATH=/usr/local/elixir/venv/bin:$PATH" ; \
/usr/local/elixir/utils/index-repository \
musl https://git.musl-libc.org/git/musl'
You can also use utils/index-all-repositories to start indexing all officially supported repositories.
After indexing is done, Elixir should be available under the following URL on your host: http://172.17.0.2/musl/latest/source
If 172.17.0.2 does not answer, you can check the IP address of the container by running:
== Automatic repository updates
The Docker image does not automatically update repositories by itself.
You can, for example, start utils/update-elixir-data
in the container (or in a separate container, with Elixir data volume/directory mounted)
from cron on the host to periodically update repositories.
== Using Docker image as a development server
You can easily use the Docker image as a development server by following the steps above, but mounting Elixir source directory from the host
into /usr/local/elixir/
in the container when running docker run elixir
.
Changes in the code made on the host should be automatically reflected in the container.
You can use apache2ctl
to restart Apache.
Error logs are available in /var/log/apache2/error.log
within the container.
= Hardware requirements
Performance requirements depend mostly on the amount of traffic that you get on your Elixir service. However, a fast server also helps for the initial indexing of the projects.
SSD storage is strongly recommended because of the frequent access to git repositories.
At Bootlin, here are a few details about the server we're using:
= Contributing to Elixir
== Supporting a new project
Elixir has a very simple modular architecture that allows to support new source code projects by just adding a new file to the Elixir sources.
Elixir's assumptions:
First make an installation of Elixir by following the above instructions.
See the projects
subdirectory for projects that are already supported.
Once Elixir works for at least one project, it's time to clone the git repository for the project you want to support:
cd /srv/git git clone --bare https://github.com/zephyrproject-rtos/zephyr
After doing this, you may also reference and fetch remote branches for this project,
for example corresponding to the stable
tree for the Linux kernel (see the
instructions for Linux earlier in this document).
Now, in your LXR_PROJ_DIR
directory, create a new directory for the
new project:
cd $LXR_PROJ_DIR mkdir -p zephyr/data ln -s /srv/git/zephyr.git repo export LXR_DATA_DIR=$LXR_PROJ_DIR/data export LXR_REPO_DIR=$LXR_PROJ_DIR/repo
Now, go back to the Elixir sources and test that tags are correctly extracted:
./script.sh list-tags
Depending on how you want to show the available versions on the Elixir pages,
you may have to apply substitutions to each tag string, for example to add
a v
prefix if missing, for consistency with how other project versions are
shown. You may also decide to ignore specific tags. All this can be done
by redefining the default list_tags()
function in a new projects/<projectname>.sh
file. Here's an example (projects/zephyr.sh
file):
list_tags() { echo "$tags" | grep -v '^zephyr-v' }
Note that <project_name>
must match the name of the directory that
you created under LXR_PROJ_DIR
.
The next step is to make sure that versions are classified as you wish
in the version menu. This classification work is done through the
list_tags_h()
function which generates the output of the ./scripts.sh list-tags -h
command. Here's what you get for the Linux project:
v4 v4.16 v4.16 v4 v4.16 v4.16-rc7 v4 v4.16 v4.16-rc6 v4 v4.16 v4.16-rc5 v4 v4.16 v4.16-rc4 v4 v4.16 v4.16-rc3 v4 v4.16 v4.16-rc2 v4 v4.16 v4.16-rc1 ...
The first column is the top level menu entry for versions. The second one is the next level menu entry, and the third one is the actual version that can be selected by the menu. Note that this third entry must correspond to the exact name of the tag in git.
If the default behavior is not what you want, you will have
to customize the list_tags_h
function.
You should also make sure that Elixir properly identifies the most recent versions:
./script.sh get-latest
If needed, customize the get_latest()
function.
If you want to enable support for compatible
properties in Devicetree files,
add dts_comp_support=1
at the beginning of projects/<projectname>.sh
.
You are now ready to generate Elixir's database for your new project:
./update.py
You can then check that Elixir works through your http server.
== Coding style
If you wish to contribute to Elixir's Python code, please follow the https://www.python.org/dev/peps/pep-0008/[official coding style for Python].
== How to send patches
The best way to share your contributions with us is to https://github.com/bootlin/elixir/pulls[file a pull request on GitHub].
= Automated testing
Elixir includes a simple test suite in t/
. To run it,
from the top-level Elixir directory, run:
prove
The test suite uses code extracted from Linux v5.4 in t/tree
.
== Licensing of code in t/tree
The copied code is licensed as described in the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/COPYING[COPYING] file included with
Linux. All the files copied carry SPDX license identifiers of GPL-2.0+
or
GPL-2.0-or-later
. Per https://www.gnu.org/licenses/gpl-faq.en.html#AllCompatibility[GNU's compatibility table], GPL 2.0+ code can be used
under GPLv3 provided the combination is under GPLv3. Moreover, https://www.gnu.org/licenses/license-list.en.html#AGPLv3.0[GNU's overview
of AGPLv3] indicates that its terms "effectively consist of the terms of GPLv3"
plus the network-use paragraph. Therefore, the developers have a good-faith
belief that licensing these files under AGPLv3 is authorized. (See also https://github.com/Freemius/wordpress-sdk/issues/166#issuecomment-310561976[this
issue comment] for another example of a similar situation.)
= License
Elixir is copyright (c) 2017--2020 its contributors. It is licensed AGPLv3.
See the COPYING
file included with Elixir for details.