BOINC / boinc

Open-source software for volunteer computing and grid computing.
https://boinc.berkeley.edu
GNU Lesser General Public License v3.0
2.01k stars 445 forks source link

Improve the out-of-box experience for scientists #4169

Closed davidpanderson closed 2 years ago

davidpanderson commented 3 years ago

Suppose a scientist (let’s call her Mary) needs lots of high-throughput computing and can’t afford the usual sources. Let’s assume that

Mary hears about volunteer computing and BOINC, and decides to investigate it. Mary will use BOINC only if this initial “out-of-box experience” (OOBE) is positive; i.e. she quickly tries out BOINC and is convinced that it works, that it’s useful to her, and that she wants to use it going forward. The ideal scenario is something like:

The current BOINC OOBE doesn’t achieve this. The main BOINC server documentation (https://boinc.berkeley.edu/trac/wiki/ProjectMain) is a sprawling mess. Marius’ Docker work (https://github.com/marius311/boinc-server-docker/blob/master/docs/cookbook.md) is a big step in the right direction, but more is needed to complete the above scenario.

BOINC competes with systems like HTCondor and AWS. We should study the OOBEs of these systems, borrow their good ideas, and make sure that we’re competitive. See, for example, https://www.youtube.com/channel/UCd1UBXmZIgB4p85t2tu-gLw

The goal

The following is a sketch of what I think the OOBE should be like. The target configuration involves:

Setting up the server host

This involves downloading a .gz file containing the BOINC server software and some VM and docker images. Then you run a script that asks one or two questions, then creates and runs a server (as Docker processes). It creates a read-me file saying:

Admin functions (start/stop server, create accounts for job submitters) are done through a web interface. After the initial setup there should be no need to log in.

Setting up a job submission host

This involves installing a package that contains job submission scripts (see below) but not the BOINC server.

Running jobs

We should handle at least two cases:

In each case, let’s assume that all files for an app are stored in a directory.

To submit a job:

boinc_run --app app_dir_path

Run this in a directory containing input files. It makes a job with those input files, running the given app. The file “cmdline”, if present, contains command-line args.

To run multiple jobs, create a directory for each job, and put input files there. Then do

boinc_run_jobs --app app_dir_path dir1 dir2 ...

To see the status of the job(s) started in the current directory:

boinc_status

If the job failed, show info like stderr output.

To abort jobs started in the current directory.

boinc_abort

To fetch the output files of completed jobs started in the current directory.

boinc_fetch

Note: fancier features can be added to this, but the basic features are ultra-simple. No XML editing, estimating job sizes, etc.

Implementation

The implementation shouldn’t be that hard. It’s based on technology we already have: boinc-server-docker and boinc2docker, and the remote job and file management mechanisms.

The server host setup script creates a BOINC project running in Docker containers, equipped with the VBox-based universal app, and some standard Docker containers, e.g. for Python apps.

On the submission host, each user has a directory ~/.boinc to contain various configuration and status files. A file ~/.boinc/apps contains a list of applications that have been used. Each one is identified by a directory path. We keep track of the mod time of the directory and the files in it; we maintain a Docker layer corresponding to the application.

The boinc_run command (a Python script) does the following:

boinc_status etc. use the remote job submission mechanism.

Computing resources

The scientist starts by running the BOINC client on one or more of their own computers (possibly Windows or Mac), and attaching to the project.

When things are working and they’re ready to scale up, they register with Science United, supplying their keywords. The vetting process may take a day or two. This will typically provide them with several hundred hosts.

Another possibility is to allow Science United users to register as “testers”, and to add a mechanism where projects can register as “test projects” on SU, with no vetting. Such projects would be allowed only to use VM apps with no network access (we’d need to add a mechanism for this). They’d get some number of hosts (50-100) for a few days.

Restructuring server documentation

Once we have this working, we need to reorganize the server docs in such a way that scientists are initially steered toward the OOBE described here, but can still access lower-level info.

Ageless93 commented 3 years ago

When things are working and they’re ready to scale up, they register with Science United, supplying their keywords.

Is the future for new projects using BOINC to only (automatically) work together with Science United? Also because Mary doesn't know anything about web servers.

cminnoy commented 3 years ago

I totally agree there should be simple easy ways to submit a single job, and multiple jobs, but first of all, Mary will need to have an easy way of setting up a project server (locally or if she chooses remotely in the Cloud). Mary wants to have decent documentation and preferably a book of high quality that guides her with all steps, provides lots of examples and recipes on how to cook her server and project, and with chapters on how to create her own tasks for different platforms. Mary is used to see how easy it is for her colleague to span new tasks on AWS or similar, so she would like to do similarly, but much better, easier and more performant with BOINC. Only then will Mary have time to take care of her little lam.

I like docker but I'm wondering if running docker images inside a VirtualBox is very efficient. Simply doing an 'Hello World' takes a vast amount of resources on the client side. Many megabytes need to be downloaded by the client that don't add to the task at hand, many CPU cycles wasted on emulating and booting the ISO, diskspace wasted. And VirtualBox is not ok for GPU computing (which also should be made easy for Mary). Yes, its kind of easy, but efficiency should be relevant. Why not also look into running tasks under WSL, or simply spawn the docker images directly on the client machine (if linux based).

smoe commented 2 years ago

There was a time (about 8 years ago) when the the Debian package that created BOINC project servers was not completely useless. I would very much love to see this revived. But I would also love someone else to address this :)

AenBleidd commented 2 years ago

Converting this to Conversation since it's a big topic to discuss before creating any particular tasks to be implemented.