ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Introduce seqfile technology to cactus_consolidated #1440

Closed glennhickey closed 4 months ago

glennhickey commented 4 months ago

Rationale same as #1438 : avoid command line size blowup in presence of large star trees.

glennhickey commented 4 months ago

for the record: stress testing done with

#!/bin/bash

NUM_LINES=1000
NUM_COPIES=2500

rm -f input_${NUM_LINES}_${NUM_COPIES}.txt

for i in Chimp Orang Gorilla Human
do 
wget -q https://raw.githubusercontent.com/ComparativeGenomicsToolkit/cactusTestData/master/evolver/primates/loci1/sim${i}.chr6 -O sim${i}.chr6
head -n ${NUM_LINES} sim${i}.chr6 > sim${i}_${NUM_LINES}.fa

for j in `seq ${NUM_COPIES}`
do
printf "sim${i}_${j}\tsim${i}_${NUM_LINES}.fa\n" >> input_${NUM_LINES}_${NUM_COPIES}.txt
done
done

printf "\nTOIL_SLURM_ARGS=\"--partition=medium --time=500\" cactus-pangenome ./js input_${NUM_LINES}_${NUM_COPIES}.txt --reference simChimp_${NUM_COPIES} --outName test_${NUM_LINES}_${NUM_COPIES} --outDir test_${NUM_LINES}_${NUM_COPIES} --gbz --haplo --batchSystem slurm --mgCores 2 --consCores 64 --mapCores 1 --indexCores 64 --logFile test_${NUM_LINES}_${NUM_COPIES}.log --batchLogsDir batch-logs_${NUM_LINES}_${NUM_COPIES}.log --doubleMem true\n\n"