Open alxsimon opened 3 years ago
It looks like hal2fasta
is converting the scaffold names to the UCSC-style . This was the default behavior of hal2fasta
at one point; however, it looks like the default was changed to keep the sequence names the same 10 months ago. The version in the commit works as expected when I run it... so I'm not sure why the error is still happening. Could you try deleting the working directory and starting CAT from scratch, so it recreates all the files under the genome_files
directory?
I see, removing the working folder did not work but I recreated the docker image. I think the issue was due to some cached parts of the docker container that was not updated.
Unfortunately I am encountering another issue, but I don't think this is linked to it. Maybe an issue with my gff.
ERROR: 2021-05-28 12:51:40,887 - [pid 476296] Worker Worker(salt=100050644, workers=10, host=BEM-S2, username=alexis, pid=474325) failed Task: TranscriptFasta for GCA017311375
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/luigi/worker.py", line 191, in run
new_deps = self._run_get_new_deps()
File "/usr/local/lib/python3.8/dist-packages/luigi/worker.py", line 133, in _run_get_new_deps
task_gen = self.task.run()
File "/usr/local/lib/python3.8/dist-packages/cat-2.0-py3.8.egg/cat/__init__.py", line 989, in run
seqs = {tx.name: tx.get_mrna(seq_dict) for tx in tools.transcripts.transcript_iterator(self.transcript_bed)}
File "/usr/local/lib/python3.8/dist-packages/cat-2.0-py3.8.egg/cat/__init__.py", line 989, in <dictcomp>
seqs = {tx.name: tx.get_mrna(seq_dict) for tx in tools.transcripts.transcript_iterator(self.transcript_bed)}
File "/usr/local/lib/python3.8/dist-packages/cat-2.0-py3.8.egg/tools/transcripts.py", line 257, in get_mrna
assert self.stop <= len(sequence) + 1
AssertionError
I have this error message when running the pipeline:
At first I thought it was similar to issue #197 so I updated the hal version in the Dockerfile.complete. I am now using the latest commit
db3d42d849a508cad23168117127b18158a228cd
and this still gives the same error.From the working directory, I see the scaffold in question (as all others) have been renamed to
>GCA017311375.CM029608.1
in thegenome_files/GCA017311375.fa
file and the error is looking for the initial keyCM029608.1
.I am attaching the new dockerfile I am using with updated version for multiple tools. cat_dockerfile_modif.txt