Not enough memory, better way to determine Xmx value

biocyberman commented 9 years ago

I encountered not enough memory error when running IndelRealigner. Increasing memoryLimit of this OneCoreJob to 12GB solved the problem. But it require recompile and reinstall of Piper. So I wonder if there is a better way to determine this memory limit.

https://github.com/NationalGenomicsInfrastructure/piper/blob/master/src/main/scala/molmed/utils/UppmaxJob.scala#L23

johandahlberg commented 9 years ago

Unfortunately that is the way to do it at the moment. Setting up a better mechanism for resource allocations is one of the things that have been on the todo-list for a very long time. PRs fixing this would be highly welcome. :)

biocyberman commented 9 years ago

I will try to look at this further. A quick question: How many jobs would piper run in parallel with this OneCoreJob is started?

johandahlberg commented 9 years ago

Depends on which jobRunner you use. If you use any of the cluster ones e.g. the Drmaa one, it will run as many as the dependency graph allows. If you use the Shell jobrunner it will run all jobs sequentially, one at the time.

biocyberman commented 9 years ago

I think there is a good way to do this, but still haven't found time to test: it can be done with Akka module of Scala: http://doc.akka.io/docs/akka/2.0/intro/getting-started-first-scala.html It may seems overkill but I also may be the most robust solution. And, another maybe, it is better to do it at upstream Queue implementation.

The idea is to have a master job/process distributes jobs to workers. The master will tries again with different parameters if the workers fail. This will also solve the problem with workers failed because of other reasons that I have experienced with Queue/piper: Not enough processors.

I will experiment with this when I can find my time.

johandahlberg commented 9 years ago

Having some experience working with Akka, I do think that it would be overkill.

If I would speculate in the simplest solution to this what needs to be done is:

Implement a way to dynamically set how much resources a certain job needs (this could e.g. be done using the Typesafe Config library (https://github.com/typesafehub/config)
Make sure that all parameters (cpu-usage, mem-usage, etc) are picked up by the jobRunner you are using (the overly complex solution with adding traits to the CommandLineFunction classes is there because the DrmaaJobRunner is not picking up the nbr of cpus to use correctly.

What I envision is to be able to write core looking something like this:

case class Sort(@Input in: File, @Output out: File) extends CommandLineFunction with ConfigLoader {
    val config = Config.load("samtools.sort")
    this.nbrOfCores = config.get("nbrOfCores")
    this.memUsage = config.get("memUsage")

    def commandLine = "...."

}

The above is of course psudocode, but I think that it would be by far the cleanest solution.

NationalGenomicsInfrastructure / piper

Not enough memory, better way to determine Xmx value #63