ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
511 stars 112 forks source link

cactus-pangenome exited #1125

Closed ZhaoHang-bio closed 1 year ago

ZhaoHang-bio commented 1 year ago

I'm using cactus-pangenome from the lastest docker image. Here is my commandline.

docker exec cactus-265 cactus-pangenome evolverFL-pan.txt --maxCores 80 --workDir ./workDirToil-pan --outDir ./FL-pg --outName FL-pg --reference speciesname --logDebug

Find bellow the content of the error file: [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 non-service, 0 new [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.job] Resolve promise files/no-job/file-28f7fdc6398a456bacfaa3b58c684c53/stream from Job('split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v1) with a <class 'dict'> [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.job] Resolve promise files/no-job/file-8e2c0281855b40f0a6436f21bd995276/stream from Job('split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v1) with a <class 'dict'> [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.job] Processing job 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v1 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job files/for-job/kind-split_fa_into_contigs/instance-xg2dcsrs/cleanup/file-6894b21c8a8740128307a23b1c040cf4/stream used 73.23% disk (1.1 GiB [1147863040B] used, 1.5 GiB [1567522614B] requested). [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.deferred] Running own deferred functions [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.deferred] Out of deferred functions! [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.deferred] Ran orphaned deferred functions from 0 abandoned state files [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.job] New job version: 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v2 [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.worker] Completed body for 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v2 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.worker] Stopping running chain of jobs: no successors: True, services: 0, checkpoint: False [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.worker] Not chaining from job 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v2 [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.worker] Worker log can be found at /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/59a0. Set --cleanWorkDir to retain this log [2023-08-01T04:22:14+0000] [MainThread] [I] [toil.worker] Finished running the chain of jobs on this node, we ran for a total of 1334.237824 seconds <========= [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.deferred] Removing own state file /run/lock/0992e2dd76a35aeab5025ae05c4290b7/deferred/funcx5vi4yyk [2023-08-01T04:22:14+0000] [Thread-1 (daddy)] [D] [toil.batchSystems.singleMachine] Child 14882 for job 77 succeeded [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.batchSystems.singleMachine] Ran jobID: 77 with exit value: 0 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Job ended: 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-xg2dcsrs v1 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobCompletedMessage with message: JobCompletedMessage(job_type='split_fa_into_contigs', job_id='kind-split_fa_into_contigs/instance-xg2dcsrs', exit_code=0) [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-split_fa_into_contigs/instance-xg2dcsrs [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-split_gfa/instance-llxyskq0, now have 0 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-split_gfa/instance-llxyskq0', result_status=0) [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Updating status of job 'Job' kind-split_gfa/instance-llxyskq0 v8 with result status: 0 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Job kind-split_gfa/instance-llxyskq0 is being processed as completely failed [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='Job', job_id='kind-split_gfa/instance-llxyskq0') [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Found new failed successors: kind-gather_fas/instance-v4luzxdm kind-bin_other_contigs/instance-0k4jc4md of job: 'Job' kind-split_gfa/instance-llxyskq0 v8 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Totally failed job: 'Job' kind-split_gfa/instance-llxyskq0 v8 is marking direct predecessor: 'filter_paf' kind-filter_paf/instance-79w7g14d v2 as having failed jobs [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-split_gfa/instance-llxyskq0 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-filter_paf/instance-79w7g14d, now have 0 [2023-08-01T04:22:14+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-filter_paf/instance-79w7g14d', result_status=0) [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Updating status of job 'filter_paf' kind-filter_paf/instance-79w7g14d v2 with result status: 0 [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Job kind-filter_paf/instance-79w7g14d is being processed as completely failed [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='filter_paf', job_id='kind-filter_paf/instance-79w7g14d') [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Found new failed successors: of job: 'filter_paf' kind-filter_paf/instance-79w7g14d v2 [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Totally failed job: 'filter_paf' kind-filter_paf/instance-79w7g14d v2 is marking direct predecessor: 'Job' kind-update_seqfile/instance-j_ptjv0r v8 as having failed jobs [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-filter_paf/instance-79w7g14d [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-update_seqfile/instance-j_ptjv0r, now have 0 [2023-08-01T04:22:16+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-update_seqfile/instance-j_ptjv0r', result_status=0) [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Updating status of job 'Job' kind-update_seqfile/instance-j_ptjv0r v8 with result status: 0 [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Job kind-update_seqfile/instance-j_ptjv0r is being processed as completely failed [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='Job', job_id='kind-update_seqfile/instance-j_ptjv0r') [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Found new failed successors: kind-make_batch_align_jobs_wrapper/instance-szyff1cm kind-export_split_wrapper/instance-4f5kn_6c kind-batch_align_jobs/instance-xp9whgv1 kind-export_align_wrapper/instance-_cbubjin kind-export_join_wrapper/instance-yshd8nuv kind-graphmap_join_workflow/instance-scurmh_2 of job: 'Job' kind-update_seqfile/instance-j_ptjv0r v8 [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Totally failed job: 'Job' kind-update_seqfile/instance-j_ptjv0r v8 is marking direct predecessor: 'Job' kind-minigraph_workflow/instance-p749_a7h v5 as having failed jobs [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-update_seqfile/instance-j_ptjv0r [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-minigraph_workflow/instance-p749_a7h, now have 0 [2023-08-01T04:22:18+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-minigraph_workflow/instance-p749_a7h', result_status=0) [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Updating status of job 'Job' kind-minigraph_workflow/instance-p749_a7h v5 with result status: 0 [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Job kind-minigraph_workflow/instance-p749_a7h is being processed as completely failed [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='Job', job_id='kind-minigraph_workflow/instance-p749_a7h') [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Found new failed successors: of job: 'Job' kind-minigraph_workflow/instance-p749_a7h v5 [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Totally failed job: 'Job' kind-minigraph_workflow/instance-p749_a7h v5 is marking direct predecessor: 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7 as having failed jobs [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-minigraph_workflow/instance-p749_a7h [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-minigraph_construct_workflow/instance-oqyci78j, now have 0 [2023-08-01T04:22:20+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-minigraph_construct_workflow/instance-oqyci78j', result_status=0) [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Updating status of job 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7 with result status: 0 [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Job kind-minigraph_construct_workflow/instance-oqyci78j is being processed as completely failed [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='sort_minigraph_input_with_mash', job_id='kind-minigraph_construct_workflow/instance-oqyci78j') [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Found new failed successors: of job: 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7 [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Totally failed job: 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7 is marking direct predecessor: 'sanitize_fasta_headers' kind-pangenome_end_to_end_workflow/instance-n91uqkwo v6 as having failed jobs [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.leader] Cleaning the predecessors of kind-minigraph_construct_workflow/instance-oqyci78j [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.toilState] Successors: one fewer for kind-pangenome_end_to_end_workflow/instance-n91uqkwo, now have 0 [2023-08-01T04:22:22+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobUpdatedMessage with message: JobUpdatedMessage(job_id='kind-pangenome_end_to_end_workflow/instance-n91uqkwo', result_status=0) [2023-08-01T04:22:24+0000] [MainThread] [D] [toil.leader] Built the jobs list, currently have 1 jobs to update and 0 jobs issued [2023-08-01T04:22:24+0000] [MainThread] [D] [toil.leader] Updating status of job 'sanitize_fasta_headers' kind-pangenome_end_to_end_workflow/instance-n91uqkwo v6 with result status: 0 [2023-08-01T04:22:24+0000] [MainThread] [D] [toil.leader] Job kind-pangenome_end_to_end_workflow/instance-n91uqkwo is being processed as completely failed [2023-08-01T04:22:24+0000] [MainThread] [D] [toil.bus] Notifying toil.bus.JobFailedMessage with message: JobFailedMessage(job_type='sanitize_fasta_headers', job_id='kind-pangenome_end_to_end_workflow/instance-n91uqkwo') [2023-08-01T04:22:24+0000] [MainThread] [D] [toil.leader] Found new failed successors: of job: 'sanitize_fasta_headers' kind-pangenome_end_to_end_workflow/instance-n91uqkwo v6 [2023-08-01T04:22:26+0000] [MainThread] [D] [toil.leader] Finished the main loop: no jobs left to run. [2023-08-01T04:22:26+0000] [MainThread] [D] [toil.serviceManager] Waiting for service manager thread to finish ... [2023-08-01T04:22:27+0000] [Thread-3 (start_services)] [D] [toil.serviceManager] Received signal to quit starting services. [2023-08-01T04:22:28+0000] [MainThread] [D] [toil.serviceManager] ... finished shutting down the service manager. Took 1.7278475761413574 seconds [2023-08-01T04:22:28+0000] [MainThread] [D] [toil.statsAndLogging] Waiting for stats and logging collator thread to finish ... [2023-08-01T04:22:28+0000] [MainThread] [D] [toil.statsAndLogging] ... finished collating stats and logs. Took 0.16349554061889648 seconds [2023-08-01T04:22:28+0000] [MainThread] [I] [toil.leader] Finished toil run with 7 failed jobs. [2023-08-01T04:22:28+0000] [MainThread] [I] [toil.leader] Failed jobs at end of the run: 'sanitize_fasta_headers' kind-pangenome_end_to_end_workflow/instance-n91uqkwo v6 'Job' kind-minigraph_workflow/instance-p749_a7h v5 'Job' kind-update_seqfile/instance-j_ptjv0r v8 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v6 'filter_paf' kind-filter_paf/instance-79w7g14d v2 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7 'Job' kind-split_gfa/instance-llxyskq0 v8 [2023-08-01T04:22:28+0000] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server. [2023-08-01T04:22:29+0000] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread. [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.common] Shutting down batch system ... [2023-08-01T04:22:29+0000] [Thread-1 (daddy)] [D] [toil.batchSystems.singleMachine] Daddy thread cleaning up 0 remaining children for batch system 140108100019264... [2023-08-01T04:22:29+0000] [Thread-1 (daddy)] [D] [toil.batchSystems.singleMachine] Daddy thread for batch system 140108100019264 finishing because no children should now exist [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Attempting worker cleanup [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.deferred] Cleaning up deferred functions system [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.deferred] Opened with own state file /run/lock/0992e2dd76a35aeab5025ae05c4290b7/deferred/funcw3lcnfg6 [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.deferred] Ran orphaned deferred functions from 0 abandoned state files [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.deferred] Removing own state file /run/lock/0992e2dd76a35aeab5025ae05c4290b7/deferred/funcw3lcnfg6 [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7 [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting coordination directory /run/lock/0992e2dd76a35aeab5025ae05c4290b7 [2023-08-01T04:22:29+0000] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.015084981918334961 seconds. Traceback (most recent call last): File "/home/cactus/cactus_env/bin/cactus-pangenome", line 8, in sys.exit(main()) File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_pangenome.py", line 214, in main toil.start(Job.wrapJobFn(pangenome_end_to_end_workflow, options, config_wrapper, input_seq_id_map, input_path_map, input_seq_order)) File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1064, in start return self._runMainLoop(rootJobDescription) File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/common.py", line 1544, in _runMainLoop jobCache=self._jobCache).run() File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/leader.py", line 289, in run raise FailedJobsException(self.jobStore, failed_jobs, exit_code=self.recommended_fail_exit_code) toil.exceptions.FailedJobsException: The job store '/data/001--Flaveria/00.data/js-pg' contains 7 failed jobs: 'sanitize_fasta_headers' kind-pangenome_end_to_end_workflow/instance-n91uqkwo v6, 'Job' kind-minigraph_workflow/instance-p749_a7h v5, 'Job' kind-update_seqfile/instance-j_ptjv0r v8, 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v6, 'filter_paf' kind-filter_paf/instance-79w7g14d v2, 'sort_minigraph_input_with_mash' kind-minigraph_construct_workflow/instance-oqyci78j v7, 'Job' kind-split_gfa/instance-llxyskq0 v8 Log from job "'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v6" follows: =========> [2023-08-01T04:00:44+0000] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG--- [2023-08-01T04:00:44+0000] [MainThread] [I] [toil] Running Toil version 5.12.0-6d5a5b83b649cd8adf34a5cfe89e7690c95189d3 on host d620ba7c289c. [2023-08-01T04:00:44+0000] [MainThread] [D] [toil] Configuration: {'workflowID': 'cd52a783-9a85-4446-ad65-a5554e48333e', 'workflowAttemptNumber': 0, 'jobStore': 'file:/data/001--Flaveria/00.data/js-pg', 'logLevel': 'DEBUG', 'workDir': '/data/001--Flaveria/00.data/workDirToil-pan', 'coordination_dir': None, 'noStdOutErr': False, 'stats': False, 'clean': 'onSuccess', 'clusterStats': None, 'restart': False, 'batchSystem': 'single_machine', 'disableAutoDeployment': False, 'max_jobs': 9223372036854775807, 'max_local_jobs': 96, 'manualMemArgs': False, 'statePollingWait': None, 'aws_batch_region': None, 'aws_batch_queue': None, 'aws_batch_job_role_arn': None, 'parasolCommand': 'parasol', 'parasolMaxBatches': 10000, 'scale': 1.0, 'allocate_mem': False, 'tes_endpoint': 'http://172.17.0.5:8000', 'tes_user': None, 'tes_password': None, 'tes_bearer_token': None, 'run_local_jobs_on_workers': False, 'caching': False, 'linkImports': True, 'moveExports': False, 'provisioner': None, 'nodeTypes': [], 'minNodes': None, 'maxNodes': [10], 'targetTime': 1800, 'betaInertia': 0.1, 'scaleInterval': 60, 'preemptibleCompensation': 0.0, 'nodeStorage': 50, 'nodeStorageOverrides': [], 'metrics': False, 'assume_zero_overhead': False, 'maxPreemptibleServiceJobs': 9223372036854775807, 'maxServiceJobs': 9223372036854775807, 'deadlockWait': 60, 'deadlockCheckInterval': 30, 'defaultMemory': 2147483648, 'defaultCores': 1, 'defaultDisk': 2147483648, 'defaultPreemptible': False, 'defaultAccelerators': [], 'maxCores': 80, 'maxMemory': 9223372036854775807, 'maxDisk': 9223372036854775807, 'retryCount': 1, 'enableUnlimitedPreemptibleRetries': False, 'doubleMem': False, 'maxJobDuration': 9223372036854775807, 'rescueJobsFrequency': 60, 'maxLogFileSize': 64000, 'writeLogs': None, 'writeLogsGzip': None, 'writeLogsFromAllJobs': False, 'write_messages': '/tmp/tmpi6vyniwp', 'environment': {}, 'disableChaining': False, 'disableJobStoreChecksumVerification': False, 'sseKey': None, 'servicePollingInterval': 60, 'useAsync': True, 'forceDockerAppliance': False, 'statusWait': 3600, 'disableProgress': False, 'readGlobalFileMutableByDefault': False, 'kill_polling_interval': 5, 'debugWorker': False, 'disableWorkerOutputCapture': False, 'badWorker': 0.0, 'badWorkerFailInterval': 0.01, 'cwl': False, 'cleanWorkDir': 'always'} [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.deferred] Opened with own state file /run/lock/0992e2dd76a35aeab5025ae05c4290b7/deferred/func8yj0xtma [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.worker] Parsed job description [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.job] New job version: 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v4 [2023-08-01T04:00:44+0000] [MainThread] [I] [toil.worker] Working on job 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v4 [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.worker] Got a command to run: _toil files/for-job/kind-split_fa_into_contigs/instance-83326_xu/cleanup/file-5c0cbbf570b1447e89f07b7515215dbe/stream /home/cactus/cactus_env/lib/python3.10/site-packages cactus.refmap.cactus_graphmap_split True [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.job] Loading user module ModuleDescriptor(dirPath='/home/cactus/cactus_env/lib/python3.10/site-packages', name='cactus.refmap.cactus_graphmap_split', fromVirtualEnv=True). [2023-08-01T04:00:44+0000] [MainThread] [I] [toil.worker] Loaded body Job('split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v4) from description 'split_fa_into_contigs' kind-split_fa_into_contigs/instance-83326_xu v4 [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.deferred] Ran orphaned deferred functions from 0 abandoned state files [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.deferred] Running job [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.job] Loading user function split_fa_into_contigs from module ModuleDescriptor(dirPath='/home/cactus/cactus_env/lib/python3.10/site-packages', name='cactus.refmap.cactus_graphmap_split', fromVirtualEnv=True). [2023-08-01T04:00:44+0000] [MainThread] [I] [cactus.shared.common] Running the command ['bash', '-c', 'set -eo pipefail && samtools faidx /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestris.fa --region-file /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/AMBIGUOUS.fa_contigs.clean | sed -e \'s/\([^:]\):\([0-9]\)-\([0-9]\)/echo "\1sub$((\2-1))_\3"/e\''] [2023-08-01T04:00:44+0000] [MainThread] [D] [toil.statsAndLogging] Suppressing the following loggers: {'websocket', 'pkg_resources', 'botocore', 'boto', 'dill', 'charset_normalizer', 'sonLib', 'requests', 'docker', 'boto3', 'urllib3', 'bcdocs', 'cactus', 'setuptools'} [2023-08-01T04:00:44+0000] [MainThread] [I] [toil-rt] 2023-08-01 04:00:44.772292: Running the command: "bash -c set -eo pipefail && samtools faidx /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestris.fa --region-file /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/AMBIGUOUS.fa_contigs.clean | sed -e 's/([^:]):([0-9])-([0-9])/echo "\1sub$((\2-1))_\3"/e'" [2023-08-01T04:01:15+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead." [2023-08-01T04:01:15+0000] [MainThread] [I] [toil-rt] 2023-08-01 04:01:15.782214: Successfully ran: "bash -c 'set -eo pipefail && samtools faidx /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestris.fa --region-file /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/AMBIGUOUS.fa_contigs.clean | sed -e '"'"'s/([^:]):([0-9])-([0-9]*)/echo "\1sub$((\2-1))_\3"/e'"'"''" in 31.009 seconds and 183.0 Mi memory [2023-08-01T04:01:15+0000] [MainThread] [I] [toil-rt] 2023-08-01 04:01:15.782730: Running the command: "bash -c set -eo pipefail && cat /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestrisAMBIGUOUS_.fa | grep -v '>' | wc | awk '{print $3-$1}'" [2023-08-01T04:01:26+0000] [MainThread] [W] [toil.lib.humanize] Deprecated toil method. Please use "toil.lib.conversions.bytes2human()" instead." [2023-08-01T04:01:26+0000] [MainThread] [I] [toil-rt] 2023-08-01 04:01:26.896839: Successfully ran: "bash -c 'set -eo pipefail && cat /data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestrisAMBIGUOUS_.fa | grep -v '"'"'>'"'"' | wc | awk '"'"'{print $3-$1}'"'"''" in 11.1128 seconds and 1.7 Mi memory [2023-08-01T04:01:26+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files: [2023-08-01T04:01:26+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-sanitize_fasta_header/instance-b0d_bpk5/file-56fcc842cf2347ecac377745fdc2ea4f/Flaveria_campestris.sanitized.fa' to path '/data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/Flaveria_campestris.fa' [2023-08-01T04:01:26+0000] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-split_gfa/instance-llxyskq0/file-c58be6032ca04c068ac7311a00c283b4/splitAMBIGUOUS_.fa_contigs' to path '/data/001--Flaveria/00.data/workDirToil-pan/0992e2dd76a35aeab5025ae05c4290b7/e574/2064/tmpudy22xe1/AMBIGUOUS.fa_contigs' [2023-08-01T04:01:26+0000] [MainThread] [D] [toil.fileStores.abstractFileStore] LOG-TO-MASTER: Job files/for-job/kind-split_fa_into_contigs/instance-83326_xu/cleanup/file-5c0cbbf570b1447e89f07b7515215dbe/stream used 66.45% disk (4.8 GiB [5187207168B] used, 7.3 GiB [7806166623B] requested). [2023-08-01T04:01:26+0000] [MainThread] [D] [toil.deferred] Running own deferred functions [2023-08-01T04:01:26+0000] [MainThread] [D] [toil.deferred] Out of deferred functions! [2023-08-01T04:01:26+0000] [MainThread] [D] [toil.deferred] Running orphaned deferred functions [2023-08-01T04:01:26+0000] [MainThread] [D] [toil.deferred] Ran orphaned deferred functions from 0 abandoned state files Traceback (most recent call last): File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/worker.py", line 403, in workerScript job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer) File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2774, in _runner returnValues = self._run(jobGraph=None, fileStore=fileStore) File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2691, in _run return self.run(fileStore) File "/home/cactus/cactus_env/lib/python3.10/site-packages/toil/job.py", line 2919, in run rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs) File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_split.py", line 512, in split_fa_into_contigs num_bases = int(cactus_call(parameters=size_cmd, check_output=True).strip()) ValueError: invalid literal for int() with base 10: '2.54259e+09' [2023-08-01T04:01:26+0000] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host d620ba7c289c <=========

No idea which step was wrong, please give me a hand, thanks.

glennhickey commented 1 year ago

The relevant part of the log is this

File "/home/cactus/cactus_env/lib/python3.10/site-packages/cactus/refmap/cactus_graphmap_split.py", line 512, in split_fa_into_contigs
num_bases = int(cactus_call(parameters=size_cmd, check_output=True).strip())
ValueError: invalid literal for int() with base 10: '2.54259e+09'

which is coming from this command

size_cmd = [['cat', contig_fasta_path], ['grep', '-v', '>'], ['wc'], ['awk', '{print $3-$1}']]

The issue, as far as I can tell, is that sometimes awk uses scientific notation once numbers get bigger than 32bits. This doesn't seem to be an issue on my desktop, but I can reproduce it in the Cactus docker. The scientific notation here trips up the Python code that's expecting an int, leading to this crash.

I think this is easy enough to fix. But just to confirm, are you running on data with an input chromosome or contig larger than 2 gigabases? That's the only explanation I can see for this error -- and also the only reason I can think of it's only coming up now -- you must be the first trying on a genome of this size...

ZhaoHang-bio commented 1 year ago

Hi glennhickey ,

Thank you for the quick diagnosis of the problem. Your assessment is spot-on. I inputted 15 chromosomal-level plant genomes to cactus-pangenome, and indeed, 7 of those genomes are larger than 2GB. It seems I've ventured into uncharted territory here. I appreciate your willingness to look into fixing this. Having this tool available for these larger genomes will be immensely beneficial for my work. Looking forward to using the updated tools.

Best regards, Hang

glennhickey commented 1 year ago

Definitely uncharted territory, and I caution you that unfortunately minigraph-cactus may not be sensitive enough to properly align these genomes (not so much because of their size but because of their repetitiveness). Still, I will get this issue patched. In the meantime, you can fix this in docker by running

apt update
apt install gawk

in your container before running cactus. You should even be able to use the --restart option to resume from the crash if you still have your jobstore on hand.

gotouerina commented 6 months ago

hi, if I use singularity, what should I do to avoid this problem?

glennhickey commented 6 months ago

This issue should be fixed in Cactus versions 2.6.7 and newer.