Using lastz instead of GPU at alignment stage

coxtonyj commented 2 years ago

Hi

I have Cactus 2.0.5 and I am using a set of 6 primate genomes to test the speedup due to SegAlign on a GPU-enabled cluster node. I installed Cactus from binary tarball and then a separate manual install of SegAlign as described in the docs, I'm of course running with --gpu.

What I see is that during the first stage (the repeat mask step) the GPU was being used (I used the Nvidia SMI tool to verify that my job was running on the GPU). However it's now running a bunch of Lastz jobs and looks to be behaving just as in non-GPU mode: the GPU seems to be idle and there seem to be just as many jobs as in non-GPU mode (the docs mentioned I should expect to see far fewer).

Do you have any initial thoughts as to what might be causing this behaviour?

Many thanks in advance

glennhickey commented 2 years ago

I think this is a duplicate of #607. The fix to the --gpu option is still tied up in a development branch but will hopefully be in the next release for real. In the meantime, see the work-around here

eskutkaan commented 2 years ago

I think this is a duplicate of #607. The fix to the --gpu option is still tied up in a development branch but will hopefully be in the next release for real. In the meantime, see the work-around here

I was experiencing the same problem as the author of this issue. I went ahead and applied the solution mentioned in #607 but nothing has changed, still getting many lastz processes and 0% utility on gpus.

coxtonyj commented 2 years ago

Thank you both for taking the time to respond. My experience is different to eskutkaan's, it seems not to find the run_segalign command (having previous run the segalign repeat masker), the relevant bit of the log (let me know if I should post more) is: log_excerpt.txt

glennhickey commented 2 years ago

@coxtonyj For some reason run_segalign and segalign_repeat_masker are in different places, so you need to add both to your PATH. run_segalign is normally in PATH.

@eskutkaan Could you try applying the change directly to src/cactus/cactus_progressive_config.xml (then rerunning pip install -U)? This is how it's done in the GPU docker image and that version works for sure.

eskutkaan commented 2 years ago

@coxtonyj For some reason run_segalign and segalign_repeat_masker are in different places, so you need to add both to your PATH. run_segalign is normally in PATH.

@eskutkaan Could you try applying the change directly to src/cactus/cactus_progressive_config.xml (then rerunning pip install -U)? This is how it's done in the GPU docker image and that version works for sure.

Tried, it did not use the GPUs on the node.

glennhickey commented 2 years ago

Yes, it does:

wget https://github.com/ComparativeGenomicsToolkit/cactus/releases/download/v2.0.5/cactus-bin-v2.0.5.tar.gz
tar zxf cactus-bin-v2.0.5.tar.gz
cd cactus-bin-v2.0.5/

sed -i src/cactus/cactus_progressive_config.xml -e  's/gpuLastz="false"/gpuLastz="true"/g'

virtualenv -p python3.8 cactus_env
echo 'export PATH=$(pwd)/bin:$PATH' >> cactus_env/bin/activate
echo 'export PYTHONPATH=$(pwd)/lib:$PYTHONPATH' >> cactus_env/bin/activate
source cactus_env/bin/activate
python3 -m pip install -U setuptools pip==21.3.1
python3 -m pip install -U -r ./toil-requirement.txt
python3 -m pip install -U .

cactus ./js examples/evolverMammals.txt ./evolver-gpu.hal --realTimeLogging  2> cactus.log
grep segalign_repeat_masker cactus.log | wc -l
10
grep run_segalign cactus.log | wc -l
18

cactus-prepare examples/evolverMammals.txt --outDir pp
cactus-preprocess ./jobstore/0 examples/evolverMammals.txt pp/evolverMammals.txt --inputNames simHuman_chr6 simCow_chr6 simDog_chr6 --realTimeLogging --logInfo --retryCount 0 2> pp.log
cactus
grep segalign_repeat pp.log | wc -l
6
cactus-blast ./js examples/evolverMammals.txt Anc2.cigar --root Anc2 --realTimeLogging --logInfo --retryCount 0 2> blast.log
grep run_segalign blast.log | wc -l
8

eskutkaan commented 2 years ago

When I follow this flow of solutions and run cactus, nvidia-smi -l shows all the GPUs are with 0% utilization. I am waiting for the job to spawn SegAlign, nothing changes, a bunch of lastz jobs is running on CPU and have no utilization on GPUs.

glennhickey commented 2 years ago

If you see segalign in the logs (as I showed) then it's using segalign. segalign itself calls lastz, so it's normal to see lastz processes running too.

ComparativeGenomicsToolkit / cactus

Using lastz instead of GPU at alignment stage #678