abhilekhsingh / gc3pie

Automatically exported from code.google.com/p/gc3pie
0 stars 0 forks source link

GC3Pie only submitting to one resource, even when many are defined and enabled #485

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Define and enable multiple resources
2. Run a session-based script that submits many jobs, more than a single 
resource can/should handle

This has been observed first when submitting a large set of jobs
across three different SLURM clusters.

What is the expected output? What do you see instead?

Job submission should be balanced across resources: the default
resource selection rules should rank 1st the resource with less queued
jobs from the submitting user.

What happens instead is that a single resource is hit every time --
even when it has exceeded the maximum running capacity.

The problem lies with the "new" scheduler design: here is the old
scheduling code (abridged) from `Core.__submit_application`::

            # decide which resource to use
            compatible_resources = self.matchmaker.filter(app, enabled_resources)

            if len(compatible_resources) <= 1:
                # shortcut: no brokering to do, just use what we've got
                targets = compatible_resources
            else:
                # update status of selected resources
                updated_resources = []
                for r in compatible_resources:
                    # ...
                    r.get_resource_status()
                    updated_resources.append(r)
                # sort resources according to Application's preferences
                targets = self.matchmaker.rank(app, updated_resources)

            # after brokering we have a sorted list of valid resource
            for resource in targets:
                try:
                        resource.submit_job(app)
                        break
                except:
                        continue

There are two key points here:

1) resource status is updated in between resource selection
   (`matchmaker.filter()`) and resource ranking (`matchmaker.rank()`).
   Indeed, data about running/queued jobs is updated by
   `get_resource_status()`.

2) No resource ranking/update is done there is only one compatible
   resource.

What happens in `Engine.progress` instead is:

            with self.scheduler(self._new,
                                self._core.resources.values()) as sched:
                for task_index, resource_name in sched:
                    task = self._new[task_index]
                    resource = self._core.resources[resource_name]
                    # try to submit; go to SUBMITTED if successful, FAILED if
                    # not
                    try:
                        self._core.submit(task, targets=[resource])
                    ...

therefore: (1) resource status is never updated, and (2)
`Core.submit()` does not do it either since it only receives one
target resource and therefore the shortcut is taken.

I can see three ways of fixing the problem:

A) Let `Engine.progress()` update all resources before starting a
   submission cycle.

   This is simple and quick, but could still lead to resource
   oversubscription when many jobs are submitted in a single cycle.
   This can be countered if *any* resource backend does correct
   bookkeeping.

B) Alter the `Scheduler` interface/contract to require that resources
   are updated.

   This would however be a sort of un-natural contract -- it's not a
   scheduler's job to update the resource status and it's easy to
   forget.

Original issue reported on code.google.com by riccardo.murri@gmail.com on 12 May 2015 at 8:08

GoogleCodeExporter commented 9 years ago
Fixed in SVN r4232; a test introduced in SVN r 4246 should avoid that we 
regress on this.

Original comment by riccardo.murri@gmail.com on 16 Jun 2015 at 10:52