When I run cactus-pangenome command, there is an error related to “pangenome_end_to_end_workflow”

ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs

Other

523 stars 111 forks source link

When I run cactus-pangenome command, there is an error related to “pangenome_end_to_end_workflow” #1198

Open tanger-code opened 1 year ago

tanger-code commented 1 year ago

Hi! I want to reconstruct a HPRC graph with HG002, and using the command cactus-pangenome ./js-grch38 ./hprc-v1.1-mc.seqfile --outName hprc-v1.1-mc-grch38 --outDir hprc-v1.1-mc-grch38 --reference GRCh38 CHM13 --filter 9 --giraffe clip filter --vcf --viz --odgi --chrom-vg clip filter --chrom-og --gbz clip filter full --gfa clip full --vcf --logFile hprc-v1.1-mc-grch38.log --batchSystem torque --mgCores 64 --mapCores 16 --consCores 64 --indexCores 64.

But there seems to be an error related to “pangenome_end_to_end_workflow”. The .stderr and .log file are here: hprc-v1.1-mc-grch38.stderr.txt hprc-v1.1-mc-grch38.log

My cluster system is torque. Why does this happen?

tanger-code commented 1 year ago

And when I run it with PBS, there is another error: /var/spool/torque/mom_priv/jobs/534733.manager.SC: /data/home/tangen/workware/cactus-bin-v2.6.4/cactus_env/bin/cactus-pangenome: /data/home/tangen/workware/cactus-bin-v2.6.4/cactus_env/bin/python3: bad interpreter: No such file or directory, but python does exist in this directory.

glennhickey commented 1 year ago

This seems to be the important part of the error:

    /var/spool/torque/mom_priv/jobs/534728.manager.SC: line 6: _toil_worker: command not found

The only cluster environment we "officially" support for Cactus now is SLURM. Toil has options for a few others, including torque, but they are not as well-tested for Cactus. You can try raising in issue here https://github.com/DataBiosphere/toil/issues

tanger-code commented 1 year ago

I reinstalled the software for older CPU Architectures, and run it with PBS. There is another error:

Then I set the PYTHONHOME=/data/home/tangen/workware/cactus-bin-v2.6.9/venv-cactus-v2.6.9/bin/python3:$PYTHONHOME, but a similar error occurred.

Is this because of system incompatibility or some other problem？

glennhickey commented 1 year ago

You might need to run something like

pip install -U toil[all]==5.12.0

but again, Cactus is only tested with SLURM.

tanger-code commented 1 year ago

Okay, I got it. Thank you. I will try the docker later.

glennhickey commented 1 year ago

Normally, you cannot use your cluster from inside the Cactus docker image. You must install Cactus in a virtualenv on your head node and work from there.

tanger-code commented 1 year ago

I run a small test in windows docker desktop to construct a small graph. The command I used is docker run -v e/HPRC_HG002:/home/tang/HPRC_HG002 quay.io/comparative-genomics-toolkit/cactus:v2.6.9 /bin/bash -c "cactus-pangenome /home/tang/HPRC_HG002/js-grch38 /home/tang/HPRC_HG002/hprc-v1.1-mc.seqfile --outName hprc-v1.1-mc-grch38 --outDir /home/tang/HPRC_HG002/hprc-v1.1-mc-grch38 --reference GRCh38 CHM13 --filter 9 --giraffe clip filter --vcf --viz --odgi --chrom-vg clip filter --chrom-og --gbz clip filter full --gfa clip full --vcf --logFile /home/tang/HPRC_HG002/hprc-v1.1-mc-grch38.log --batchSystem slurm --mgCores 64 --mapCores 16 --consCores 64 --indexCores 64 2> /home/tang/HPRC_HG002/hprc-v1.1-mc-grch38.stderr".

But there is still a mistake here : hprc-v1.1-mc-grch38.stderr.txt

This is the directory on Windows： This is the seqfile that only contains CHM13,GRCh38 and HG002: hprc-v1.1-mc.seqfile.txt

tanger-code commented 1 year ago

When I run without --batchSystem parameter, It seems to be running correctly(at least it hasn't stopped). Are there any bad effects of not using this parameter？

And then it stopped, for the reason : toil.batchSystems.abstractBatchSystem.InsufficientSystemResources: The job 'minigraph_construct' kind-minigraph_construct/instance-dueekojc v1 is requesting 405415491274 bytes of memory, more than the maximum of 403933802496 bytes of memory that SingleMachineBatchSystem was configured with, or enforced by --maxMemory. Scale is set to 1.0. I only used 4 haplotypes. Why does it request so much memory? If I use all 90 haplotypes, will the memory used be much higher?

glennhickey commented 1 year ago

There are three relevant --batchSystem options for Cactus

1) --batchSystem single_machine (this is the default if not specified): Run on one computer 2) --batchSystem slurm : Run distributed on slurm cluster 3) --batchSystem mesos --provisioner aws : Run distributed on AWS cluster

While others, including torque, are listed in the help

  --batchSystem {aws_batch,parasol,single_machine,grid_engine,lsf,mesos,slurm,tes,torque,htcondor,kubernetes}
                        The type of batch system to run the job(s) with,
                        currently can be one of aws_batch, parasol,
                        single_machine, grid_engine, lsf, mesos, slurm, tes,
                        torque, htcondor, kubernetes. default=single_machine

and the Toil Documentation they are much less well-tested and I don't recommend using them.