iPython cluster submission problem

leiendeckerlu commented 7 years ago

Hi there,

I'm currently working on getting our bcbio installation running on our cluster to speed up the analysis. So far, I'm able to submit the jobs by using the IPython framework to our SGE, however, it then suddenly crashes, and I'm not sure what the reason is:

I submit the job with qsub to the q and the nodes (96 in total) are also correctly allocated:

#$ -q fancy.q     
#$ -l nodes=4,rpn=24      
#$ -N IPython_test    
#$ -j y             
#$ -cwd             

bcbio_nextgen.py ../config/RNAseq_config.yaml -t ipython -n 24 -s sge

and then get the following error message from bcbio:

[2017-05-03T15:23Z] compute-6-11: System YAML configuration: /bcbio/galaxy/bcbio_system.yaml [2017-05-03T15:23Z] compute-6-11: Resource requests: ; memory: 1.00; cores: 1 [2017-05-03T15:23Z] compute-6-11: Configuring 1 jobs to run, using 1 cores each with 1.00g of memory reserved for each job Traceback (most recent call last): File "/tooldir/bin/bcbio_nextgen.py", line 4, in import('pkg_resources').run_script('bcbio-nextgen==1.0.2', 'bcbio_nextgen.py') File "/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script

File "/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script

File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py", line 234, in main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in /bioinformatics/_run_toplevel system.write_info(dirs, parallel, config) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 32, in write_info minfos = _get_machine_info(parallel, sys_config, dirs, config) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 58, in _get_machine_info with prun.start(parallel, [[sys_config]], config, dirs) as run_parallel: File "/bcbio/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/prun.py", line 55, in start with ipython.create(parallel, dirs, config) as view: File "/bcbio/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 1069, in cluster_view wait_for_all_engines=wait_for_all_engines) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 956, in init _start(scheduler, self.profile, queue, num_jobs, cores_per_job, self.cluster_id, extra_params) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 829, in _start resources, specials = _scheduler_resources(scheduler, extra_params, queue) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 809, in _scheduler_resources specials["pename"] = _find_parallel_environment(queue) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 339, in _find_parallel_environment for name in subprocess.check_output(["qconf", "-spl"]).strip().split(): File "/bcbio/anaconda/lib/python2.7/subprocess.py", line 212, in check_output process = Popen(stdout=PIPE, *popenargs, **kwargs) File "/bcbio/anaconda/lib/python2.7/subprocess.py", line 390, in init errread, errwrite) File "/bcbio/anaconda/lib/python2.7/subprocess.py", line 1024, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

Any help is highly appreciated!

roryk commented 7 years ago

Hi @leiendeckerlu,

Two guesses as what might be going wrong. The problem is this step is trying to grab the SGE configuration by calling qconf, but it can't find qconf so it is failing.

First guess: Is qconf something you could put in your path?

Second guess: Is this actually SGE? There are other schedulers that use qsub like Torque that aren't SGE that don't have the qconf command.

leiendeckerlu commented 7 years ago

Hi @roryk ,

wow, that was fast, so the answer to your second guess is Yes, it is SGE. Double-checked this with our cluster documentation.

Regarding the first guess: qconf is indeed not in my PATH. I can use qconf in the login environment but not on the computing nodes. I will have to talk to the admins to see whether that is possible. Will come back to you.

roryk commented 7 years ago

Great-- yeah we need access to qconf so we can figure out how to submit the SGE job properly. We inspect the output of qconf to select a parallel environment to use that has access to the queue.

If you can figure out which parallel environment to use, you can skip that step by adding pename to your bcbio-nextgen call (http://bcbio-nextgen.readthedocs.io/en/latest/contents/parallel.html?highlight=SGE).

leiendeckerlu commented 7 years ago

Well, I got qconf running, but now it seems like it is not able to detect which parallel environment to use:

[2017-05-03T17:31Z] compute-6-27: System YAML configuration: /bcbio/galaxy/bcbio_system.yaml [2017-05-03T17:31Z] compute-6-27: Couldn't get machine information from resource query function for queue 'openmpi.q' on scheduler "sge"; submitting job to queue Traceback (most recent call last): File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 48, in _get_machine_info return sched_info_dict[parallel["scheduler"].lower()](parallel.get("queue", "")) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 135, in _sge_info mem_info = _sge_get_mem(qhost_out, queue) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 182, in _sge_get_mem raise Exception("Unrecognized suffix in mem_tot from SGE") Exception: Unrecognized suffix in mem_tot from SGE [2017-05-03T17:31Z] compute-6-27: Resource requests: ; memory: 1.00; cores: 1 [2017-05-03T17:31Z] compute-6-27: Configuring 1 jobs to run, using 1 cores each with 1.00g of memory reserved for each job No cluster queue or queue instance matches the phrase "openmpi" No cluster queue or queue instance matches the phrase "openmpi" No cluster queue or queue instance matches the phrase "openmpi" Traceback (most recent call last): File "/tooldir/bin/bcbio_nextgen.py", line 4, in import('pkg_resources').run_script('bcbio-nextgen==1.0.2', 'bcbio_nextgen.py') File "/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script

File "/bcbio/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1484, in run_script

File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py", line 234, in main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio_nextgen-1.0.2-py2.7.egg-info/scripts/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main fc_dir, run_info_yaml) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 82, in _run_toplevel system.write_info(dirs, parallel, config) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 32, in write_info minfos = _get_machine_info(parallel, sys_config, dirs, config) File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/system.py", line 58, in _get_machine_info with prun.start(parallel, [[sys_config]], config, dirs) as run_parallel: File "/bcbio/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/prun.py", line 55, in start with ipython.create(parallel, dirs, config) as view: File "/bcbio/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 1069, in cluster_view wait_for_all_engines=wait_for_all_engines) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 956, in init _start(scheduler, self.profile, queue, num_jobs, cores_per_job, self.cluster_id, extra_params) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 829, in _start resources, specials = _scheduler_resources(scheduler, extra_params, queue) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 809, in _scheduler_resources specials["pename"] = _find_parallel_environment(queue) File "/bcbio/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 348, in _find_parallel_environment "https://blogs.oracle.com/templedf/entry/configuring_a_new_parallel_environment") ValueError: Could not find an SGE environment configured for parallel execution. See https://blogs.oracle.com/templedf/entry/configuring_a_new_parallel_environment for SGE setup instructions.

Also, our admins told me to alter the ipcluster_config.py file and set c.IPClusterEngines.engine_launcher_class = 'MPIEngineSetLauncher. Do you think that this is useful? I only found the ipcluster_config.py file in the log subfolder /projectFolder/work/log/ipython/. Is there like a universal ipcluster_config.py file?

chapmanb commented 7 years ago

Thanks for following up and all the work helping debug this. The issue that you're running into is that bcbio is having trouble finding the right parallel environment in SGE. It uses this to submit jobs like alignment that require multiple cores together. The difficulty we have automating this is that there is no standard way to do this in SGE and every environment is different. There is more detail in the documentation here:

http://bcbio-nextgen.readthedocs.io/en/latest/contents/parallel.html?highlight=pename#ipython-parallel

Your best approach would be to ask your cluster administrators what SGE parallel environment to use for multiple cores and specify that on the commandline to bcbio with -r pename=yourpe. bcbio doesn't use MPI so changing the configuration to support that won't help with fixing the issue. Hope this helps get it running.

leiendeckerlu commented 7 years ago

So I talked to the cluster admins and this is the answer: We only use defined PE environments for runs on single nodes (maxing out at 24). If I want to used multiple nodes I have to set the granularity by using -r node=4 -r rpn=24 -q MultiNodeQ.q Well, that is what I'm doing at the moment as you can see from my logs. Any additional idea?

chapmanb commented 7 years ago

Thanks for the work debugging this. I don't see a log file, could you provide as a Gist (http://gist.github.com/) so we could see what the error message is? In general, are there no multicore penames in your environment at all that you would specify? Ideally you'd do -r pename=yourpename -q MultiNodeQ.q. I'm not sure what the resources you're supplying do exactly but trying to specify fine grained detail at the top level is likely to clash with bcbio trying to allocate job arrays with -t. Apologies, SGE is hard because there are so many ways to specify things. It might be worth asking with your administrators if they'd support using parallel environments in this way. Hope this helps some.

roryk commented 7 years ago

I wasn't sure from your description but it sounds like your SGE doesn't have parallel environments set up for the queue you want to use? I added a check for "none" as a pename, which we can use to skip setting the pename if that is the case which will disable trying to run qconf. We need to fix the actual engine job script.

For your bcbio call:

#$ -l nodes=4,rpn=24

does this mean give me 4 nodes and use 24 cores on each node for a total of 96 cores? or does it mean distribute 24 total cores across 4 different nodes?

leiendeckerlu commented 7 years ago

@roryk Indeed, nodes=4 and rpn=24 gives me 96 total cores distributed to 4 nodes. According to our cluster admins I should use this particular queue, which is the only queue supporting distributed jobs on different nodes. This queue supports MVAPICH, MPICH and DRMAA and provides host files for these types of configurations. Going on, they claim that iPython should be able to pick up this host files on the respective hosts and work accordingly. Is bcbio using DRMAA to submit its jobs? Or what kind of interface is it using?

leiendeckerlu commented 7 years ago

@chapmanb Here you go: This is the script that I use to submit jobs to our queue (qsub run.sh): https://gist.github.com/leiendeckerlu/4eaa75d0a460bb7172d660f14fb44c06 And here you can find corresponding logs: https://gist.github.com/leiendeckerlu/8d095d65a9b631dc6a742438fef5244e https://gist.github.com/leiendeckerlu/7378b376f2066e0fceb36f83fbf79cb0

leiendeckerlu commented 7 years ago

@roryk @chapmanb Any ideas how to resolve this issue?

chapmanb commented 7 years ago

Sorry for the delay in getting back to you on this. bcbio needs to have a parallel environment but seems to be having lots of trouble parsing your SGE setup and finding it. It sounds like you're trying some workarounds, but bcbio doesn't support the alternative configuration approaches like rpn. bcbio doesn't use any interface like DRMAA to submit jobs but instead just creates batch scripts and submits them directly with qsub, hence the need to be able to specify a parallel environment for multicore jobs.

To fix this, we'd need to understand what parallel environment we're meant to use for multicore jobs. This is one of the names in qconf -spl and you'll probably have to re-ask your admins which can support submitting a multicore jobs. If you don't have a multicore setup and can on, would it be possible to add it to your configuration?

At a secondary level, It can't correctly parse your memory specifications for the queue. bcbio can recover from this by directly querying machines but if you could you provide a gist for the output of qhost -q and we could try to debug.

Hope this helps some. Sorry to keep pushing back but hopefully identifying and setting up a parallel environment will get it working cleanly.

leiendeckerlu commented 7 years ago

Yeah, the PE is also the reason why bcbio runs just fine as long as I only use one cluster node (24 cores) as there is a PE set up. I again talked to the cluster admins and asked them to set up a PE for us (as I think that this would be the easiest solution), will see how that goes.

In the meantime, here is the output from qconf -spl: https://gist.github.com/leiendeckerlu/6fcd5addec8f13d4bbb953fad2a3c7cf

and this is the output from qhost -q: https://gist.github.com/leiendeckerlu/c861c0b50436ce87d1f7f48703d9894d

chapmanb commented 7 years ago

Thanks for the details on your setup, it looks like not all of the queues have memory specifications which is why qhost -q fails to identify them. So I think bcbio is doing the right thing here by falling back to running a job to get memory/cores on machines and there is nothing we can improve there. Hopefully getting a PE setup will allow things to work distributed for you. Thanks for all the help debugging.

pforai commented 7 years ago

Hi @chapmanb, @leiendeckerlu The reason why we have this is that per GE defaults for MPI-style PEs that run across different machines the allocation geometry cannot be defined easily, as such we have a special setup to allow the users to define no of nodes, no of MPI ranks/process across nodes and number of threads per MPI rank generating valid machine files for MPICH/MVAPICH and a bunch of compatible MPI implementations. I was thinking that it makes the most sense to execute bcbio within this context due to my understanding of the way it can parallelize things.

Could any of you maybe confirm what I think is going on here There are two things that bcbio can handle in a parallel fashion

Multithreaded binaries on a single shared memory node. On GE a SMP style PE is present on certain queues to get this running and as far I can understand this works as the submission succeeds due to correct parsing of PE and queue setup and bcbio can call the analysis binaries in the matching way to what is allocated to the job from the GE side.
Multinode strategies that involve running multithreaded binaries on one shared memory machine and running several multithreadead binaries on non shared memory nodes. It is my undestanding that bcbio uses IPython engines for this. Now IPython engines itself has multiple backends to execute on several nodes.

My assumption was that it would be easiest in a GE setup to configure IPython engines to pickup MPI hostfiles and to use that handle the dispatch of the analysis binaries. As far as I know IPython engines can be configured to pickup MPI machine/hostfiles that conform to the various flavours that MPI launchers support in order to match GE allocation and execution on the CPU cores of nodes that were allocated by GE.

The naive PE configuration for MPI stye workloads can and will potentially allocate randomly across nodes and if bcbio is configured to use a certain amount of CPUs for execution how is the dispatch mechanism aware of which node got what number of slots allocated and how is this passed to IPython engines?

Thanks, P

chapmanb commented 7 years ago

Peter; Thanks for providing the additional background on your setup, this helps a lot. What bcbio does is pretty simple: it creates SGE batch scripts to spin up a job array of IPython worker engines. In a bcbio work directory there will be sge_engine* files which provide all of the details about what it tries to do on your cluster but here is the template code (https://github.com/roryk/ipython-cluster-helper/blob/master/cluster_helper/cluster.py#L236). So what it wants to do is spin up, say 5 nodes with 8 cores on each. IPython itself then handles sending jobs to those 5 machines and the bcbio code runs the multicore processes knowing they each have 8 cores.

So for us the easiest thing would be to allow the SMP style PE and give bcbio access to those. The MPI hostfile parsing would take a lot more work from our side, and I'm not 100% sure how best to proceed with that. Would the PE route work to get things working right away?

Thanks again and hope this helps provide more background.

pforai commented 7 years ago

Thanks for your clarification, that certainly helped and we have the submission with task arrays working as designed. I guess we can go ahead and close this issue, @leiendeckerlu agreed?

leiendeckerlu commented 7 years ago

@chapmanb @pforai Yes, we can close here! Nice work from you guys!

bcbio / bcbio-nextgen

iPython cluster submission problem #1920