Closed crinfante closed 5 months ago
Can you try using --halFile <hal file>
instead of --chromSizes
and telling me if it works?
I think the issue is that it's finding a genome name with a .
in it (which screws up bigmaf summary) and tries to work around it -- but that logic only works with a HAL input. If this is what's going on, there needs to be a better error message.
Now it fails at the maf conversion step with exited 255: stderr=reference sequence has to be on positive strand on line 954211
. So is it a problem with with the original HAL file format?
The command was:
cactus-maf2bigmaf \
--refGenome mm39 \
--halFile group14.hal \
"${SLURM_JOBID}/jobstore" \
group14.mm39.maf.gz \
group14.mm39.bb
And the log:
[2024-04-02T09:54:29-0600] [MainThread] [I] [toil.statsAndLogging] Cactus Command: /home/biology/lab/.miniforge3/envs/cactus_align/bin/cactus-maf2bigmaf --refGenome mm39 --halFile group14-way.hal 10667863/jobstore group14-way.mm39.maf.gz group14-way.mm39.bb
[2024-04-02T09:54:29-0600] [MainThread] [I] [toil.statsAndLogging] Cactus Commit: 7286b49b264896f43cc64aa405b39f914d43f75b
[2024-04-02T09:54:29-0600] [MainThread] [I] [toil.statsAndLogging] Importing group14-way.mm39.maf.gz
[2024-04-02T09:54:35-0600] [MainThread] [I] [toil.statsAndLogging] Importing group14-way.hal
[2024-04-02T09:54:35-0600] [MainThread] [I] [toil] Running Toil version 6.0.0-0e2a07a20818e593bfdfde3cc51ca4ad809fde96 on host math-alderaan-c12.
[2024-04-02T09:54:35-0600] [MainThread] [I] [toil.realtimeLogger] Starting real-time logging.
[2024-04-02T09:54:35-0600] [MainThread] [I] [toil.leader] Issued job 'maf2bigmaf_workflow' kind-maf2bigmaf_workflow/instance-_742whsu v1 with job batch system ID: 1 and disk: 2.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-04-02T09:54:36-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:36.358197: Running the command: "mafToBigMaf"
[2024-04-02T09:54:36-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:36.533961: Running the command: "bedToBigBed"
[2024-04-02T09:54:36-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:36.758923: Running the command: "hgLoadMafSummary"
[2024-04-02T09:54:37-0600] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2024-04-02T09:54:37-0600] [MainThread] [I] [toil.leader] Issued job 'maf2bigmaf_chrom_sizes' kind-maf2bigmaf_chrom_sizes/instance-cq3omh_8 v1 with job batch system ID: 2 and disk: 16.7 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-04-02T09:54:37-0600] [MainThread] [I] [toil-rt] Reading HAL file from job store to /tmp/69baea9fc96d57f4ba92bc8ba4d22355/abf5/a62f/tmp8a517twi/group14-way.hal
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil-rt] Computing chromosome sizes
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:56.040814: Running the command: "halStats /tmp/69baea9fc96d57f4ba92bc8ba4d22355/abf5/a62f/tmp8a517twi/group14-way.hal --chromSizes mm39"
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:56.411228: Successfully ran: "halStats /tmp/69baea9fc96d57f4ba92bc8ba4d22355/abf5/a62f/tmp8a517twi/group14-way.hal --chromSizes mm39" in 0.2696 seconds
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:56.411830: Running the command: "halStats --genomes /tmp/69baea9fc96d57f4ba92bc8ba4d22355/abf5/a62f/tmp8a517twi/group14-way.hal"
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:56.455361: Successfully ran: "halStats --genomes /tmp/69baea9fc96d57f4ba92bc8ba4d22355/abf5/a62f/tmp8a517twi/group14-way.hal" in 0.0152 seconds
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil.leader] Issued job 'maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v1 with job batch system ID: 3 and disk: 4.7 Gi, memory: 4.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-04-02T09:54:56-0600] [MainThread] [I] [toil.leader] Issued job 'maf2bigmaf_summary' kind-maf2bigmaf_summary/instance-ejt_h2f_ v1 with job batch system ID: 4 and disk: 1.4 Gi, memory: 4.0 Gi, cores: 1, accelerators: [], preemptible: False
[2024-04-02T09:54:56-0600] [Thread-4 ] [W] [toil.statsAndLogging] Got message from job at time 04-02-2024 09:54:56: Job used more disk than requested. For CWL, consider increasing the outdirMin requirement, otherwise, consider increasing the disk requirement. Job 'maf2bigmaf_chrom_sizes' kind-maf2bigmaf_chrom_sizes/instance-cq3omh_8 v1 used 100.00% disk (16.7 GiB [17966690304B] used, 16.7 GiB [17966684624B] requested).
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil-rt] Reading MAF file from job store to /tmp/69baea9fc96d57f4ba92bc8ba4d22355/65ef/b252/tmpb02u1rq5/group14-way.mm39.maf.gz
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil-rt] Reading MAF file from job store to /tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz
[2024-04-02T09:54:58-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:54:58.178682: Running the command: "bash -c set -eo pipefail && gzip -dc /tmp/69baea9fc96d57f4ba92bc8ba4d22355/65ef/b252/tmpb02u1rq5/group14-way.mm39.maf.gz | mafDuplicateFilter -km - | hgLoadMafSummary -minSeqSize=1 -test mm39 bigMafSummary stdin"
[2024-04-02T09:55:12-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:55:12.655398: Running the command: "bash -c set -eo pipefail && gzip -dc /tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz | mafDuplicateFilter -km - | mafToBigMaf mm39 stdin stdout | sort -k1,1 -k2,2n"
[2024-04-02T09:55:26-0600] [Thread-1 ] [E] [toil.batchSystems.singleMachine] Got exit code 1 (indicating failure) from job _toil_worker maf2bigmaf file:/data002/scratch/lab/wga/group/10667863/jobstore kind-maf2bigmaf/instance-o4cr7qph.
[2024-04-02T09:55:26-0600] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v1
Exit reason: None
[2024-04-02T09:55:26-0600] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v2
[2024-04-02T09:55:26-0600] [MainThread] [W] [toil.leader] Log from job "kind-maf2bigmaf/instance-o4cr7qph" follows:
=========>
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil] Running Toil version 6.0.0-0e2a07a20818e593bfdfde3cc51ca4ad809fde96 on host math-alderaan-c12.
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil.worker] Working on job 'maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v1
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil.worker] Loaded body Job('maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v1) from description 'maf2bigmaf' kind-maf2bigmaf/instance-o4cr7qph v1
[2024-04-02T09:54:57-0600] [MainThread] [I] [toil-rt] Reading MAF file from job store to /tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz
[2024-04-02T09:55:12-0600] [MainThread] [I] [toil-rt] 2024-04-02 09:55:12.655398: Running the command: "bash -c set -eo pipefail && gzip -dc /tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz | mafDuplicateFilter -km - | mafToBigMaf mm39 stdin stdout | sort -k1,1 -k2,2n"
[2024-04-02T09:55:25-0600] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2024-04-02T09:55:25-0600] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-8cec0a3771684393aeb73a9ceea2a54e/group14-way.mm39.maf.gz' to path '/tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz'
[2024-04-02T09:55:25-0600] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-maf2bigmaf_chrom_sizes/instance-cq3omh_8/file-cb3f6e93e1ee4a4e917b9094ffb39964/mm39.chrom_sizes' to path '/tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/mm39.chrom_sizes'
Traceback (most recent call last):
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/toil/worker.py", line 407, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/toil/job.py", line 2829, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/toil/job.py", line 2746, in _run
return self.run(fileStore)
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/toil/job.py", line 2974, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/cactus/maf/cactus_maf2bigmaf.py", line 228, in maf2bigmaf
cactus_call(parameters=bigmaf_cmd, outfile=bigmaf_bed_path)
File "/home/biology/lab/.miniforge3/envs/cactus_align/lib/python3.8/site-packages/cactus/shared/common.py", line 906, in cactus_call
raise RuntimeError("{}Command {} exited {}: {}".format(sigill_msg, call, process.returncode, out))
RuntimeError: Command ['bash', '-c', 'set -eo pipefail && gzip -dc /tmp/69baea9fc96d57f4ba92bc8ba4d22355/f918/9ddf/tmp3yglfowq/group14-way.mm39.maf.gz | mafDuplicateFilter -km - | mafToBigMaf mm39 stdin stdout | sort -k1,1 -k2,2n'] exited 255: stderr=reference sequence has to be on positive strand on line 954211
[2024-04-02T09:55:26-0600] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host math-alderaan-c12
<=========
Yes, this is another issue #1320 that I'm trying to figure out now...
I'm running
cactus-maf2bigmaf
as follows:And the job fails at the
maf2bigmaf_summary
step:I don't know how to interpret the
TypeError: 'NoneType'
. Any help would be appreciated. I'd rather not have to resort to converting the MAF stepwise using the old UCSC Genome Browser FAQ. Thanks!