Rfam / rfam-family-pipeline

Backend for the Rfam family building pipeline
3 stars 2 forks source link

Slurm support #95

Closed nawrockie closed 5 months ago

nawrockie commented 1 year ago

Adds scheduler field to Rfam/Conf/rfam.conf which can either be lsf or slurm, and adds support for it in the code, mainly in Rfam/Lib/Bio/Rfam/Utils.pm in submit_nonmpi_job, submit_mpi_job, and wait_for_cluster_light.

I've tested anecdotally and it works in my hands with scheduler defined as lsf (on LSF) or slurm (on SLURM), with the exception of MPI.

There may be some new minor edits we have to make for MPI compatibility. I'm getting help with systems currently on that. But MPI is not default on any scripts, and is only used if -cmpi (rfsearch) or -mpi (rfmake) are enabled.

There's two places I'm still unsure of:

In Rfam/Scripts/view/rfam_family_view.pl, the help message includes:

This is a script to run the Rfam view process for a given family. It's intended                                                                                                                                                                                                                                             
to be run by the job dequeuer, which polls the rfam_jobs.job_history table for                                                                                                                                                                                                                                              
pending view process jobs, and runs this script on the farm via "bsub" or "sbatch".                                                                                                                                                                                                                                         

I am unfamiliar with the job dequeuer so I'm not sure if anything
needs to be done to switch to slurm. Do either of you know more?

And in Rfam/Scripts/view/make_sunburst.pl the end of the help has an
example bsub command that explains how to process the families in
chunks. I'm not familiar with this either so didn't try to update it
to an sbatch example, even though it probably should be.

blakesweeney commented 1 year ago

Thanks so much for this Eric!

I think I know what the job_dequeuer refers to but will have to check. I will try to get back to you this week about the pull request, and if there are any issues. Thanks!

blakesweeney commented 1 year ago

I think the job_dequeuer refers to this module https://github.com/Rfam/rfam-family-pipeline/blob/master/Rfam/Lib/Bio/Rfam/View/Dequeuer.pm, which if I understand is what runs the view process to update the website. Seems that it creates some LSF jobs, that could be replaced with SLURM commands. Could you look at modifying this? We have started a release this week so it will be a week or so until we can run the new slurm based code. Thanks!

nawrockie commented 12 months ago

@blakesweeney : yes I'll look into the dequeuer and get back to you if I have questions.

nawrockie commented 12 months ago

@blakesweeney and @emmaco : I took a look at the dequeuer code and it was written by John Tate about 9 years ago. It uses a LSF perl module that I'm not familiar with. I could try to morph it into working with slurm but it occurred to me that it may make sense to check with Pfam to see if they have already adapted a similar dequeuer from LSF to slurm, that is if they are still using a perl-based pipeline. Any idea?

blakesweeney commented 12 months ago

I'll ask around and let you know.

nawrockie commented 11 months ago

@blakesweeney and @emmaco : MPI seems to be working now on slurm thanks to Ahmed Elmazaty's help. But for it to work with the pipeline, infernal will have to be re-configured with the --enable-mpi flag after loading an openmpi module. I'm not sure exactly how you build infernal , but the following commands will build MPI-ready executables when run rom the top-level infernal src directory. (Also, you may want to upgrade to infernal 1.1.5.)

$ module load openmpi/4.1.4                                                                                                                                                                                                                                                                                
$ which mpicc                                                                                                                                                                                                                                                                                              
/hps/software/spack/opt/spack/linux-rocky8-cascadelake/gcc-11.2.0/openmpi-4.1.4-jzdz5rzapd7n4z5z4rmdqm3dyyzxaglp/bin/mpicc                                                                                                                                                                                                  
$ sh ./configure --enable-mpi; make  
nawrockie commented 11 months ago

If you're wondering if/when MPI will be useful: my guess is it will help most for building and searching the biggest families (LSU, SSU) and also for generating alignments (rfmake.pl -a) for any big families with many thousands of hits in the FULL.

blakesweeney commented 11 months ago

Hi Eric, we just finished up the release last week so we can try this out. Do you think upgrading infernal is required, or can we test out SLURM changes without updating infernal for now?

nawrockie commented 11 months ago

Non-MPI slurm functionality can be tested with existing infernal 1.1.4 as it is in /hps/software/users/agb/rfam/bin/, if infernal 1.1.4 is recompiled following the instructions above with --enable-mpi then you can test MPI functionality too. You could update to 1.1.5 at that point, but it's not necessary for testing at all.