Closed iaposto closed 2 years ago
Sorry, I don't have an answer for this, as I didn't write the curate.py script, and now we use unitig-caller. I'd suspect this doesn't make a huge difference, or you could probably do something like add an N (or other non-DNA character) to the end of each contig before running to prevent needing to run this script.
However, I'm glad you found a solution with the newer package!
I am having some trouble understanding how to use split_unitigs.py to clean the output of unitig-counter. I tried merging the assemblies in one file with
cat genomes/* > merged.fa
and then usingpython3 curate.py merged.fa unitigs.txt 31
but the script stops withRuntimeError: generator raised StopIteration
. I also triedpython3 curate.py genomes/* unitigs.txt 31
but I guess that's not the correct input for the argument references.fa of the script. I am using unitig-counter because I had memory issues with unitg-caller as reported here. The genomes folder contains ~2000 bacterial assemblies.Any help would be greatly appreciated! Thanks!
EDIT: I found a solution for unitig-caller, see here.