Open rgladstone opened 11 months ago
Hi, would you be able to send across the assembly_paths.txt file, please?
assembly_paths.txt Sure thing, attached now.
Hi Rebecca, if you could, would you be able to check the following, please:
As bifrost can merge k-mers across contig breaks, there is the possibility these unitigs are not present in any source genome.
Yes they are in the .gfa file and not in the assemblies so that makes sense, thanks for clarifying.
Hi Rebecca, it seems the issue might be with Bifrost where some k-mers are not annotated with colours (https://github.com/pmelsted/bifrost/issues/73). I would suggest trying with an earlier version of Bifrost (e.g. v1.2.1) and see if this gives the same error.
Sorry for the delay! I have tried a conda env with unitig-caller 1.3.0 and bifrost 1.2.1, and that gave the same results. I noticed there was a fix for #29 so I tried replacing the bifrost.py code within unitig-caller 1.3.0 (bifrost 1.3.1) with the updated code and I still get the same results.
I ran
unitig-caller --call --refs assembly_paths.txt --out ZA_unitigs_v3 --write-graph --threads 32
Version 1.3.0. Where assembly_paths.txt had 1944 assembly paths, and the output had 903111 unitigs
Two unitigs had no lane details:
CCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACC | TTGTTCATAGTTCCATTATAGCAAAAAAAGGGCTCTATAATATTTGTAGTG | 15841_5_29:1
and
TAAAGAAGTCTCCGAAATTCCGCACTGAGCATCTTCTCCGAAAAAGGCCGCTAATGTGGCCTTTTTCTTTACCTGTGGTTCTCCGCCAAAATCCCAGCAAATTGCATCACCAAAGCTAAAAGCTTTCAGGGTTGTCTAAAAAGCACAAGACATAAGAGGAAGTGCGGTATTTTATAATCAAGCCCCCAAGAATTTACCATAACATCCGTTGCCCGCACCGCCTGAGACGCGTTCAGCGCGTTCCTGACGAAACCATGACAAAAACCACAACAAACCACCCCGGAACCCGTCAGAAACGCGCCTGTTAAATTTTAACGGCATGCATGACTATGCACCAGAATGACGCCATGCTCTTTTCACGCAAAAATCATCACCAGACGGGGAAAATCACCAGTGACCAGACAGGAATCCGCCGCCCTCAATATGGCCAAATTTATCCGCGCACAGACACTTCTCCTCCTTGAGCGGCTCGAGCAGATGGATCTGGATGAGGCTGCCGGCTGCTGTGAGCACCTGCACGATCAGGCCGAAGCGCTTTACGCCATGCTGAACGCACAGATAGGCGAGGAAAATGCGTGAAAATCGGTGAACGGGTGCGCAATTCAGTGCGCGGCCGTGAGGCGATGGCGGGGTGTCGGGGCGCAGCCCTGACCAGGGTATTTGTGATGCCGGCGCGTGCGCGGTATTACAAATGCACATCCTGTCCCGGAACGGACACCGGGAAACAGCAAAAAAAACCGGGCGGCACGCCCGGAACTCAATCAAGTTAGATTAGATTACTCTCACTCGTCCATAACAGCATCATGGAACGACGACCACCGTCCGTGACGGCCGCCTCGTTTAAGTATGGACAGAAATACAGAAAATGCTCAGGACGAAATGTAATGAATGCGAACGGATTCAAGAAATTCGAGCATGACAGTCCTTACGGCCGGTTCGGTTTCAGACAAAATCTGCCGGTATGCATCCAGCATCATGGCTCCGGCATCCCCTCCGGCACGCCGTAGCCAGACCGAAACAACGGACACAAGCAGGTGTCGCTCATCATCACTAAGAGTCATCAGGGCTCCGGAAGAAAAACCAAAC | GAGCACTTTTAATTTGGTGACTTGAGTTATGAGCCAGAATATTTGTTTGACTTGAACTT | 15841_2_75:1
I just wanted to understand why this might be, in case lane IDs are missing from some of the other unitigs seen in more than one isolate.