ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
507 stars 112 forks source link

error in cactus-align-batch step #695

Open zhanghaoipp opened 2 years ago

zhanghaoipp commented 2 years ago

I have test a small set of 10 fungi genomes (around 40M) according to the pangenome pipeline (https://github.com/glennhickey/pg-stuff/blob/main/cactus-pangenome.sh) The steps were as follow: I used the docker version, first creat the comtainer: docker run -v $(pwd):/data --rm -it quay.io/comparative-genomics-toolkit/cactus:v2.0.5 bash then:

Step 1. make the minigraph with 180197 as the reference $ minigraph -xggs 180197_pilon4_nomt.fas 140001.fas 140002.fas 140003.fas 140004.fas 140005.fas 140006.fas 140007.fas 140008.fas 140009.fas > fa10.gfa

Step 2. map contigs to minigraph $ cactus-graphmap ./jobstore fa10.txt fa10.gfa fa10-orig.paf --outputFasta fa10.gfa.fa --maskFilter 100000 --reference 180197 --delFilter 1000000

Step 3. mask coverage gaps $ cactus-preprocess ./jobstore fa10.txt fa10_masked.txt --maskFile fa10-orig.paf --minLength 100000 --ignore 180197 --maskAction softmask

Step 4. divide fasta and PAF into chromosomes $ cactus-graphmap-split ./jobstore fa10_masked.txt fa10.gfa fa10-orig.paf --refContigsFile ref_ctg --reference 180197 --outDir chrmos --maskFilter 100000

Step 5. align each chromosome with Cactus, producing output in both HAL and vg $ cp chrmos/chromfile.txt . $ cactus-align-batch ./jobstore chromfile.txt align-batch --alignCores 10 --alignOptions "--pafInput --pangenome --outVG --barMaskFilter 100000 --realTimeLogging --reference 180197 --retryCount 0"

In the step 5, it showed error as bellow, I have checked the input files, it seems well. could you help me about it? Thank you!

root@02011d070b83:/data# cactus-align-batch ./jobstore chromfile.txt align-batch --alignCores 10 --alignOptions "--pafInput --pangenome --outVG --barMaskFilter 100000 --realTimeLogging --reference 180197 --retryCount 0" [2022-03-24T07:03:17+0000] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 new [2022-03-24T07:03:17+0000] [MainThread] [I] [toil.job] Processing job 'align_toil_batch' kind-align_toil_batch/instance-kwjksr_m v0 [2022-03-24T07:03:17+0000] [MainThread] [I] [toil] Running Toil version 5.6.0-c34146a6437e4407a61e946e968bcce67a0ebbca on host 02011d070b83. [2022-03-24T07:03:17+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil_batch' kind-align_toil_batch/instance-kwjksr_m v1 with job batch system ID: 0 and cores: 1, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:03:18+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/bd2f/worker_log.txt [2022-03-24T07:03:19+0000] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run [2022-03-24T07:03:19+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil' kind-align_toil/instance-a10sci1i v1 with job batch system ID: 1 and cores: 10, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:03:19+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil' kind-align_toil/instance-szc407xh v1 with job batch system ID: 2 and cores: 10, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:03:19+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil' kind-align_toil/instance-pk3eulnh v1 with job batch system ID: 3 and cores: 10, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:03:19+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil' kind-align_toil/instance-s7z60wf5 v1 with job batch system ID: 4 and cores: 10, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:03:20+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/bf04/worker_log.txt [2022-03-24T07:03:20+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/6dce/worker_log.txt [2022-03-24T07:03:20+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/d7d4/worker_log.txt [2022-03-24T07:03:20+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/553b/worker_log.txt [2022-03-24T07:11:06+0000] [MainThread] [I] [toil.leader] Issued job 'align_toil_batch' kind-align_toil_batch/instance-kwjksr_m v2 with job batch system ID: 5 and cores: 1, disk: 2.0 Gi, and memory: 2.0 Gi [2022-03-24T07:11:07+0000] [MainThread] [I] [toil.worker] Redirecting logging to /tmp/6b1ee798b05f5fc8adb3b33e42f196c6/01c8/worker_log.txt [2022-03-24T07:11:09+0000] [MainThread] [I] [toil.leader] Finished toil run successfully.

Workflow Progress 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 (0 failures) [07:50<00:00, 0.01 jobs/s] Traceback (most recent call last): File "/usr/local/bin/cactus-align-batch", line 8, in sys.exit(main_batch()) File "/usr/local/lib/python3.8/dist-packages/cactus/setup/cactus_align.py", line 808, in main_batch toil.exportFile(results[0], makeURL(os.path.join(options.outHal, '{}.hal'.format(chrom)))) File "/usr/local/lib/python3.8/dist-packages/toil/lib/compatibility.py", line 12, in call return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/toil/common.py", line 1147, in exportFile return self.export_file(jobStoreFileID, dstUrl) File "/usr/local/lib/python3.8/dist-packages/toil/common.py", line 1158, in export_file self._jobStore.export_file(file_id, dst_uri) File "/usr/local/lib/python3.8/dist-packages/toil/jobStores/abstractJobStore.py", line 447, in export_file self._export_file(otherCls, file_id, parseResult) File "/usr/local/lib/python3.8/dist-packages/toil/jobStores/fileJobStore.py", line 320, in _export_file atomic_copy(srcPath, destPath, executable=executable) File "/usr/local/lib/python3.8/dist-packages/toil/lib/io.py", line 112, in atomic_copy shutil.copyfile(src_path, dest_path_tmp) File "/usr/lib/python3.8/shutil.py", line 259, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: '/data/align-batch/chr3.hal.3b65fd0b-0a88-4926-a836-8bdff92bafc7.tmp.hal'

zhanghaoipp commented 2 years ago

It seems the ahl and vg files have been produced in jobstore, but have not been export? I have searched these files:

$ find jobstore -name hal jobstore/files/for-job/kind-align_toil/instance-s7z60wf5/file-fae05cc747e34dc0ab75a746a4b46dc8/chr4.hal jobstore/files/for-job/kind-align_toil/instance-pk3eulnh/file-aa05567f2c8045a0b1b581bbf79eee6e/chr2.hal jobstore/files/for-job/kind-align_toil/instance-szc407xh/file-6f8534d3705049c4a059c33d937c3ad8/chr1.hal jobstore/files/for-job/kind-align_toil/instance-a10sci1i/file-cee6a7e9733349ff9c4d65e3d542cb9b/chr3.hal $ find jobstore -name vg jobstore/files/for-job/kind-align_toil/instance-s7z60wf5/file-2af652cf25d943b98b76a160d2371581/chr4.vg jobstore/files/for-job/kind-align_toil/instance-pk3eulnh/file-c1187d2832a940d98a42fae359b50127/chr2.vg jobstore/files/for-job/kind-align_toil/instance-szc407xh/file-6704ccb024be429fba524154576aa196/chr1.vg jobstore/files/for-job/kind-align_toil/instance-a10sci1i/file-867f78add7594ead9ecb958c733cd5e7/chr3.vg

glennhickey commented 2 years ago

Yes, that's a bad typo that made it into the release, where only S3 output works with cactus-align-batch. There is a small patch you can make to 'cactus_align.py' to get it working (and it will be fixed in the next release):

https://github.com/ComparativeGenomicsToolkit/cactus/issues/615

glennhickey commented 2 years ago

Also, yes it's a bug in the export stage -- if you copy those files out ithey should be fine.

glennhickey commented 2 years ago

I don't think you need to do any chromosome splitting for 38M genomes, even if you have 200, so you should be fine with just cactus-align.

On Fri, Mar 25, 2022 at 3:14 AM zhanghaoipp @.***> wrote:

Yes, that's a bad typo that made it into the release, where only S3 output works with cactus-align-batch. There is a small patch you can make to 'cactus_align.py' to get it working (and it will be fixed in the next release):

615 https://github.com/ComparativeGenomicsToolkit/cactus/issues/615

Thanks For your help! The reference have 4 chromosomes and about 38M, now I test 10 genomes, but if I use 200 genomes in this species, which program is better? cactus-align-batch?

— Reply to this email directly, view it on GitHub https://github.com/ComparativeGenomicsToolkit/cactus/issues/695#issuecomment-1078724898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373WDLCHACMI5ZHC7EM3VBVRT3ANCNFSM5RQL6CGA . You are receiving this because you commented.Message ID: @.***>