[x] TODO: Rebase on master once #78 is pulled. I iterated on this branch because it had some improvements and I wanted to avoid merge conflicts.
While installing a few reference genomes on my galaxy I got annoyed by the indexing steps. These take quite a long time. And run-data-managers only runs one data manager at a time. I feel that job scheduling should be handled by Galaxy and not by run-data-managers so I changed the way that run-data-managers submits jobs.
Now run-data-managers first picks all the data managers that populate source tables (DEFAULT: ["all_fasta"]). Since other data managers depend on these tables. Then it runs them. After that it runs all the other data managers. Let galaxy figure out to schedule all these jobs.
This provides a significant speedup when you're adding a vertebrate genome to the list. Instead of watching your bowtie and bwa indexes be created one after another, they are now created simultaneously.
Internally I had to completely overhaul run-data-managers. It is a now a DataManagers object that has a run method. This made a lot of interfunction communication much easier. Also the code is a bit cleaner now. The DataManagers object can now also be used in other scripts.
Since I had to do some testing I overhauled the tests scripts as well. These are now split in 3 parts. The shed-tools testing was quite slow, and I did not want to wait on it all the time. There is now a separate script for testing run-data-managers which made testing a bit easier.
While installing a few reference genomes on my galaxy I got annoyed by the indexing steps. These take quite a long time. And
run-data-managers
only runs one data manager at a time. I feel that job scheduling should be handled by Galaxy and not byrun-data-managers
so I changed the way thatrun-data-managers
submits jobs.Now
run-data-managers
first picks all the data managers that populate source tables (DEFAULT: ["all_fasta"]). Since other data managers depend on these tables. Then it runs them. After that it runs all the other data managers. Let galaxy figure out to schedule all these jobs. This provides a significant speedup when you're adding a vertebrate genome to the list. Instead of watching your bowtie and bwa indexes be created one after another, they are now created simultaneously.Internally I had to completely overhaul
run-data-managers
. It is a now aDataManagers
object that has a run method. This made a lot of interfunction communication much easier. Also the code is a bit cleaner now. The DataManagers object can now also be used in other scripts.Since I had to do some testing I overhauled the tests scripts as well. These are now split in 3 parts. The
shed-tools
testing was quite slow, and I did not want to wait on it all the time. There is now a separate script for testingrun-data-managers
which made testing a bit easier.