Help with halPhyloPTrain.py

xiaoyezao commented 1 year ago

Hello, I got the following error with halPhyloPTrain.py. Can you please help to debug this?

halPhyloPTrain.py $hal_file $reference $neutralRegions.bed $neutralModel.mod --numProc 12

Reading alignment from Lactuca_neutralModel_halPhyloPTrain_temp_NADOHCS_Lactuca_neutralModel_halPhyloPTrain_temp_NADOHCS_Lsat_1_Genome_v11.01.annotation_Maker.gff.tidy.chr8.4d4d.maf ...
ERROR msa_reorder_rows: covered[new_to_old[5]]=1 should be 0
Traceback (most recent call last):
  File "/home/CBS2021/app/cactus-bin-v2.5.2/bin/halPhyloPTrain.py", line 260, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/CBS2021/app/cactus-bin-v2.5.2/bin/halPhyloPTrain.py", line 257, in main
    computeModel(args)
  File "/home/CBS2021/app/cactus-bin-v2.5.2/bin/halPhyloPTrain.py", line 134, in computeModel
    computeAgMAFStats(options)
  File "/home/CBS2021/app/cactus-bin-v2.5.2/bin/halPhyloPTrain.py", line 103, in computeAgMAFStats
    runShellCommand("msa_view -o SS -z --in-format MAF --aggregate %s %s > %s" % (
  File "/home/CBS2021/app/cactus-bin-v2.5.2/lib/hal/stats/halStats.py", line 27, in runShellCommand
    raise RuntimeError("Command: %s exited with non-zero status %i" %
RuntimeError: Command: msa_view -o SS -z --in-format MAF --aggregate Anc0,T.koksaghyz,Anc1,L.virosa,Anc2,L.saligna,Anc3,L.serriola,L.sativa Lactuca_neutralModel_halPhyloPTrain_temp_NADOHCS*.maf > Lactuca_neutralModel_halPhyloPTrain_temp_NADOHCS.ss exited with non-zero status 1

The head of the intermediate *.maf file is:

##maf version=1 scoring=N/A

a
s       L.sativa.Lsat_1_v11_chr8        94777886        1       +       343517054       T
s       Anc0.Anc0refChr3        163905  1       +       189369  T
s       Anc1.Anc1refChr1593     44277   1       +       81021   T
s       Anc2.Anc2refChr2805     62794   1       -       240409  T
s       Anc3.Anc3refChr2040     413548  1       +       2493988 T
s       L.saligna.chr8  142086748       1       -       238633233       T
s       L.serriola.Lser_1_US96UC23_v10_chr8     93752352        1       +       329941051       T
s       L.virosa.Lvir_CGN04683_V4_scf4  186602678       1       -       342977835       T
s       T.koksaghyz.GWHBCHF00000009     5976022 1       +       111408619       T

a
s       L.sativa.Lsat_1_v11_chr8        94777904        1       +       343517054       T
s       Anc0.Anc0refChr3        163923  1       +       189369  T
s       Anc1.Anc1refChr1593     44295   1       +       81021   T
s       Anc2.Anc2refChr2805     62812   1       -       240409  T
s       Anc3.Anc3refChr2040     413566  1       +       2493988 T
s       L.saligna.chr8  142086766       1       -       238633233       T
s       L.serriola.Lser_1_US96UC23_v10_chr8     93752370        1       +       329941051       T
s       L.virosa.Lvir_CGN04683_V4_scf4  186602696       1       -       342977835       T
s       T.koksaghyz.GWHBCHF00000009     5976040 1       +       111408619       T

a
s       L.sativa.Lsat_1_v11_chr8        94777919        1       +       343517054       C
s       Anc0.Anc0refChr3        163938  1       +       189369  C
s       Anc1.Anc1refChr1593     44310   1       +       81021   C
s       Anc2.Anc2refChr2805     62827   1       -       240409  C
s       Anc3.Anc3refChr2040     413581  1       +       2493988 C
s       L.saligna.chr8  142086781       1       -       238633233       C
s       L.serriola.Lser_1_US96UC23_v10_chr8     93752385        1       +       329941051       C
s       L.virosa.Lvir_CGN04683_V4_scf4  186602711       1       -       342977835       C
s       T.koksaghyz.GWHBCHF00000009     5976055 1       +       111408619       C

glennhickey commented 1 year ago

I worry that the halPhyloP tools have gone stale since they haven't been tested for so long. It is a project we are talking about reviving. But in the meantime, I think you'd be best served by using cactus-hal2maf --dupeMode single to export a single-copy maf then run PhlyoP directly on the MAF.

GeorgeBGM commented 1 year ago

Hi, I want to do something similar, so the first step is to use cactus-hal2maf --dupeMode single to export a single-copy maf, but it generates the following error message, how should I solve the problem.

cactus-hal2maf js --workDir work --maxCores 5 --dupeMode single mc.full.hal mc.full.maf --chunkSize 1000000 --refGenome Chimpanzee

GeorgeBGM commented 1 year ago

Hi, are there any suggestions? I'm looking forward for your reply.

glennhickey commented 1 year ago

Please share your full log.

GeorgeBGM commented 1 year ago

Hi,the following is the full error message.

[2023-06-29T20:49:19+0800] [MainThread] [I] [toil.statsAndLogging] Setting batchCores to 20 [2023-06-29T20:49:19+0800] [MainThread] [I] [toil.statsAndLogging] Enabling realtime logging in Toil [2023-06-29T20:49:19+0800] [MainThread] [I] [toil.statsAndLogging] Cactus Command: /home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/bin/cactus-hal2maf /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/js --workDir /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/work --maxCores 20 --dupeMode single /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/Merge-V1-mc.full.hal /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/Merge-V1-mc.full.maf --chunkSize 3 --refGenome Chimpanzee --restart [2023-06-29T20:49:19+0800] [MainThread] [I] [toil.statsAndLogging] Cactus Commit: a33d3eabb909873746ecd8e7e1528344e526d95b [2023-06-29T20:49:19+0800] [MainThread] [I] [toil.statsAndLogging] Using default batch count of 1 [2023-06-29T20:49:20+0800] [MainThread] [C] [toil.jobStores.abstractJobStore] Repairing job: kind-hal2maf_batch/instance-88w6p79f [2023-06-29T20:49:20+0800] [MainThread] [I] [toil] Running Toil version 5.11.0a1-ee11d4bc8e9a0d38c636208d0090c619bce76a4b on host cpu13.
[2023-06-29T20:49:20+0800] [MainThread] [I] [toil.realtimeLogger] Starting real-time logging. [2023-06-29T20:49:22+0800] [MainThread] [I] [toil.leader] Issued job 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v4 with job batch system ID: 1 and disk: 675.0 Gi, memory: 2.0 Gi, cores: 20, accelerators: [], preemptible: False [2023-06-29T20:49:24+0800] [MainThread] [I] [toil.leader] 1 jobs are running, 0 jobs are issued and waiting to run [2023-06-29T20:49:31+0800] [MainThread] [I] [toil.worker] Redirecting logging to /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/work/25669f8a9a105c5fa94e0c82e81877df/39e5/worker_log.txt [2023-06-29T20:52:27+0800] [MainThread] [I] [toil-rt] Reading HAL file from job store to /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/work/25669f8a9a105c5fa94e0c82e81877df/39e5/d115/tmp7k0r51a5/Merge-V1-mc.full.hal [2023-06-29T21:49:26+0800] [MainThread] [I] [toil.leader] 1 jobs are running, 0 jobs are issued and waiting to run [2023-06-29T22:25:47+0800] [Thread-1 ] [E] [toil.batchSystems.singleMachine] Got exit code -15 (indicating failure) from job _toil_worker hal2maf_batch file:/home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/js kind-hal2maf_batch/instance-88w6p79f. [2023-06-29T22:25:48+0800] [MainThread] [W] [toil.leader] Job failed with exit value -15: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v4 Exit reason: None [2023-06-29T22:25:48+0800] [MainThread] [W] [toil.leader] No log file is present, despite job failing: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v4 [2023-06-29T22:25:48+0800] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v4 with ID kind-hal2maf_batch/instance-88w6p79f to 1 [2023-06-29T22:25:48+0800] [MainThread] [I] [toil.leader] Issued job 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v5 with job batch system ID: 2 and disk: 675.0 Gi, memory: 2.0 Gi, cores: 20, accelerators: [], preemptible: False [2023-06-29T22:26:48+0800] [MainThread] [I] [toil.worker] Redirecting logging to /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/work/25669f8a9a105c5fa94e0c82e81877df/f0bd/worker_log.txt [2023-06-29T22:30:02+0800] [MainThread] [I] [toil-rt] Reading HAL file from job store to /home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/work/25669f8a9a105c5fa94e0c82e81877df/f0bd/c076/tmpnwx0ah9k/Merge-V1-mc.full.hal [2023-06-29T22:49:27+0800] [MainThread] [I] [toil.leader] 1 jobs are running, 0 jobs are issued and waiting to run [2023-06-29T23:30:45+0800] [Thread-1 ] [E] [toil.batchSystems.singleMachine] Got exit code -15 (indicating failure) from job _toil_worker hal2maf_batch file:/home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/js kind-hal2maf_batch/instance-88w6p79f. [2023-06-29T23:30:45+0800] [MainThread] [W] [toil.leader] Job failed with exit value -15: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v5 Exit reason: None [2023-06-29T23:30:45+0800] [MainThread] [W] [toil.leader] No log file is present, despite job failing: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v5 [2023-06-29T23:30:46+0800] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v5 with ID kind-hal2maf_batch/instance-88w6p79f to 0 [2023-06-29T23:30:46+0800] [MainThread] [W] [toil.leader] Job 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v6 is completely failed
[2023-06-29T23:31:28+0800] [MainThread] [I] [toil.leader] Finished toil run with 3 failed jobs. [2023-06-29T23:31:28+0800] [MainThread] [I] [toil.leader] Failed jobs at end of the run: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v6 'hal2maf_workflow' kind-hal2maf_workflow/instance-v3_4s4en v2 'hal2maf_all' kind-hal2maf_ranges/instance-g1d_w2o2 v5 [2023-06-29T23:31:28+0800] [MainThread] [I] [toil.realtimeLogger] Stopping real-time logging server. [2023-06-29T23:31:29+0800] [MainThread] [I] [toil.realtimeLogger] Joining real-time logging server thread. Traceback (most recent call last): File "/home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/bin/cactus-hal2maf", line 8, in sys.exit(main()) File "/home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/lib/python3.9/site-packages/cactus/maf/cactus_hal2maf.py", line 173, in main
maf_id = toil.restart() File "/home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/lib/python3.9/site-packages/toil/common.py", line 1101, in restart return self._runMainLoop(rootJobDescription) File "/home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/lib/python3.9/site-packages/toil/common.py", line 1511, in _runMainLoop return Leader(config=self.config, File "/home/ddu/Software/Anaconda/mambaforge-pypy3/envs/MC/lib/python3.9/site-packages/toil/leader.py", line 289, in run raise FailedJobsException(self.jobStore, failed_jobs, exit_code=self.recommended_fail_exit_code) toil.exceptions.FailedJobsException: The job store '/home/ddu/Project/Test/001.Merge_Test_V1/Merge-V1_Detail/js' contains 3 failed jobs: 'hal2maf_batch' kind-hal2maf_batch/instance-88w6p79f v6, 'hal2maf_workflow' kind-hal2maf_workflow/instance-v3_4s4en v2, 'hal2maf_all' kind-hal2maf_ranges/instance-g1d_w2o2 v5 Command exited with non-zero status 1 Command being timed: "bash 002.cmd_test_all_Phylo.sh" User time (seconds): 8676.49 System time (seconds): 850.47 Percent of CPU this job got: 96% Elapsed (wall clock) time (h:mm:ss or m:ss): 2:43:51 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 442428960 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 829 Minor (reclaiming a frame) page faults: 242907409 Voluntary context switches: 1087443 Involuntary context switches: 30726 Swaps: 0 File system inputs: 67947858 File system outputs: 106817984 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1

glennhickey commented 1 year ago

Sorry, I can't really make that out. I've had some problems with --restart in the past. I'm guessing that's what's at issue here (the --restart option). I will just say that --chunkSize 3 is way too small and will certainly not help. If, say, you are working on a human alignment, then that will ask it to make ~1,000,000,000 hal2maf jobs, each for 3bp. which will certainly crash your filesystem.

GeorgeBGM commented 1 year ago

Hi, I will rerun this code using the --chunkSize 1000000 parameter.

ComparativeGenomicsToolkit / hal

Help with halPhyloPTrain.py #274