ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
521 stars 111 forks source link

cactus-graphmap-join error in vg validate step #971

Closed ETC100 closed 1 year ago

ETC100 commented 1 year ago

Hello, I tried to construct graph pangenome with cactus version 2.4.3 and complete the cactus-minigraph, cactus-graphmap, cactus-graphmap-split, cactus-align-batch steps. However, when I ran cactus-graphmap-join step with command cactus-graphmap-join ./jobstore --vg graph_chroms1/alignments/.vg --hal graph_chroms1/alignments/.hal --outDir ./Result --outName Last_graph --reference REF --vcf --giraffe clip", It cause an error as followed. "NC_006096.5" means "Chr9" chromosome, so it seems that error only occured in Chr9. The related file you might need: https://drive.google.com/file/d/1-90qWGcrvPyvx4h-8rlpllFk7PwQLORy/view?usp=share_link

Error

    [2023-03-22T00:17:00+0800] [MainThread] [I] [toil-rt] 2023-03-22 00:17:00.664846: Running the command: "vg validate /tmp/7354cdf93ebd53ce9cae66b053fc47da/3538/ec6f/tmppo15uaev/NC_006096.5.vg.clip"
    [2023-03-22T00:17:15+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
    [2023-03-22T00:17:15+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-8e4f1e0faa464ff7910cfa98df7184c1/NC_006096.5.vg' to path '/tmp/7354cdf93ebd53ce9cae66b053fc47da/3538/ec6f/tmppo15uaev/NC_006096.5.vg'
    Traceback (most recent call last):
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/toil/worker.py", line 403, in workerScript
        job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/toil/job.py", line 2743, in _runner
        returnValues = self._run(jobGraph=None, fileStore=fileStore)
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/toil/job.py", line 2660, in _run
        return self.run(fileStore)
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/toil/job.py", line 2888, in run
        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/cactus/refmap/cactus_graphmap_join.py", line 421, in clip_vg
        cactus_call(parameters=['vg', 'validate', clipped_path])
      File "/home/user/anaconda3/envs/graph_genome/lib/python3.8/site-packages/cactus/shared/common.py", line 839, in cactus_call
        raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
    RuntimeError: Command ['vg', 'validate', '/tmp/7354cdf93ebd53ce9cae66b053fc47da/3538/ec6f/tmppo15uaev/NC_006096.5.vg.clip'] exited 1: stdout=None, stderr=graph invalid: missing edge between 434922th step (320323:0) and 434923th step (877048:0) of path BLH#0#Chr9#0
    graph invalid: missing edge between 434923th step (877048:1) and 434922th step (320323:0) of path BLH#0#Chr9#0
    graph invalid: missing edge between 434555th step (320323:0) and 434556th step (877048:0) of path Thailand#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 434556th step (877048:1) and 434555th step (320323:0) of path Thailand#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 436330th step (320323:0) and 436331th step (877048:0) of path Cornish#0#Chr9#0
    graph invalid: missing edge between 436331th step (877048:1) and 436330th step (320323:0) of path Cornish#0#Chr9#0
    graph invalid: missing edge between 926th step (320323:0) and 927th step (877048:0) of path _MINIGRAPH_#s57999
    graph invalid: missing edge between 927th step (877048:1) and 926th step (320323:0) of path _MINIGRAPH_#s57999
    graph invalid: missing edge between 433238th step (320323:0) and 433239th step (877048:0) of path Houdan#0#Chr9#0
    graph invalid: missing edge between 433239th step (877048:1) and 433238th step (320323:0) of path Houdan#0#Chr9#0
    graph invalid: missing edge between 442338th step (320323:0) and 442339th step (877048:0) of path Naked_neck#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 442339th step (877048:1) and 442338th step (320323:0) of path Naked_neck#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 434937th step (320323:0) and 434938th step (877048:0) of path Silkies#0#Chr9#0
    graph invalid: missing edge between 434938th step (877048:1) and 434937th step (320323:0) of path Silkies#0#Chr9#0
    graph invalid: missing edge between 441017th step (320323:0) and 441018th step (877048:0) of path REF#NC_006096.5
    graph invalid: missing edge between 441018th step (877048:1) and 441017th step (320323:0) of path REF#NC_006096.5
    graph invalid: missing edge between 435304th step (877048:1) and 435303th step (320323:0) of path Asil#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 473941th step (320323:0) and 473942th step (877048:0) of path Fayoumi#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 473942th step (877048:1) and 473941th step (320323:0) of path Fayoumi#0#NC_006096.5_RagTag#0
    graph invalid: missing edge between 433174th step (320323:0) and 433175th step (877048:0) of path Tibetan#0#Chr9#0
    graph invalid: missing edge between 433175th step (877048:1) and 433174th step (320323:0) of path Tibetan#0#Chr9#0
    graph: invalid

    [2023-03-22T00:17:15+0800] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host CAU

<=========

glennhickey commented 1 year ago

Thanks for sharing the data to reproduce. This seems to be happening in GFAffix, which cactus-graphmap-join uses to normalize the graph. I will patch Cactus via a new GFAffix version once that issue is fixed.

You can turn off gfaffix normalization with

sed src/cactus/cactus_progressive_config.xml -e "s/gfaffix=\"1\"/gfaffix=\"0\"/g" > config.xml

then running cactus-graphmap-join with the --configFile config.xml option.

This should fix your crash right now, but the gfaffix normalization is usually pretty important because it removes lots of duplicated sequence from the graph...

ETC100 commented 1 year ago

Thanks for your help. I have solved this problem according to your suggestion. The graph genome constructed without normalization is just a little bigger than normalized one.

danydoerr commented 1 year ago

Fixed the bug in GFAffix--let me know in case any further issues pop up.