bacpop / unitig-counter

Uses cDBG to count unitigs in bacterial populations
GNU Affero General Public License v3.0
13 stars 2 forks source link

Using split_unitigs.py to clean output #16

Closed iaposto closed 2 years ago

iaposto commented 2 years ago

I am having some trouble understanding how to use split_unitigs.py to clean the output of unitig-counter. I tried merging the assemblies in one file with cat genomes/* > merged.fa and then using python3 curate.py merged.fa unitigs.txt 31 but the script stops with RuntimeError: generator raised StopIteration. I also tried python3 curate.py genomes/* unitigs.txt 31 but I guess that's not the correct input for the argument references.fa of the script. I am using unitig-counter because I had memory issues with unitg-caller as reported here. The genomes folder contains ~2000 bacterial assemblies.

Any help would be greatly appreciated! Thanks!

EDIT: I found a solution for unitig-caller, see here.

johnlees commented 2 years ago

Sorry, I don't have an answer for this, as I didn't write the curate.py script, and now we use unitig-caller. I'd suspect this doesn't make a huge difference, or you could probably do something like add an N (or other non-DNA character) to the end of each contig before running to prevent needing to run this script.

However, I'm glad you found a solution with the newer package!