How to use cactus-hal2maf under MPI model

Hi, Glenn,

Sorry to bother you, I have a cactus alignment result of more than 300 species (a hal file of about 400G),

I would like to use the program cactus-hal2maf to convert my results to the maf format. I noticed that this program performs parallel computing by generating multiple subtasks of hal2maf, and the number of running subtasks can be adjusted by the "batch count".

However, each node on my cluster only has 64 CPUs (meaning each node can only run 64 subtasks simultaneously). Therefore, I would like to utilize the resources of multiple nodes to run more subtasks simultaneously through cross-node processing.

But when I used MPI, I found that the log file showed "raise JobStoreExistsException(self.jobStoreDir)", "FileExistsError: [Errno 17] File exists: jobStore", indicating that the script ran cactus-hal2maf separately on each node in the cluster and all used the same job store directory.

So I would like to ask if my script is incorrect. To utilize the resources of multiple nodes for parallel computing, what should I do? Below is my script.

ompi-mpirun cactus-hal2maf ./jobStore --workDir ./temp --batchCores 1472 --batchCount 3 --batchParallelHal2maf 1472 --batchParallelTaf 1472 --refGenome Danio_rerio --chunkSize 1000000 --noAncestors total.hal total.maf

ComparativeGenomicsToolkit / cactus

How to use cactus-hal2maf under MPI model #1183