Open Sanat-Mishra opened 7 months ago
If your cluster has more than one node, you can try using --batchCount
to increase the number of jobs.
If your nodes have multiple cores, you can also consider --batchCores
to allow each job to use more than one thread.
Thanks! I'm also concerned that the code is taking too long because I have a huge hal file (~805 GB). Does cactus-hal2maf create copies of the hal file to distribute among worker nodes?
Yes, on slurm cactus will create a local copy for each batch (job). So you typically want to use --batchCores
to make sure that at most one job runs at once on a given node...
Got it.
We're trying to pull out alignments corresponding to different transcript BED files and then concatenating the blocks in the MAF file to do downstream analysis. Do you have any suggestions on how we can use --maxrefgaps or about concatenation in general?
This probably doesn't answer your question, but the --bedRanges
option here will produce a MAF of your regions concatenated together. If you want the actual MAF blocks merged together of consecutive regions, it's probably simplest to merge them in the BED first.
In terms of reference gaps, you control that with --maximumGapLength
, but I'm not sure how high you can practically scale that to (I think the default is around 50).
Hi, here is my command for running cactus-hal2maf:
cactus-hal2maf ./jobstore /ocean/projects/bio200049p/smishra1/Files/241-mammalian-2020v2.hal /ocean/projects/bio200049p/smishra1/Tools_Installed/cactus-bin-v2.8.1/ENST00000293981.10.maf.gz --refGenome Homo_sapiens --bedRanges /ocean/projects/bio200049p/zjiang2/Files/cactus_test/ENST00000293981.10.bed --noAncestors --chunkSize 10000 --workDir /ocean/projects/bio200049p/smishra1/Cactus/ --batchCores 1 --batchSystem slurm --maxMemory 240G
This has been running for more than an hour. Is this something we expect or can I speed it up somehow?
Thanks!