bacpop / unitig-caller

Methods to determine sequence element (unitig) presence/absence
Apache License 2.0
18 stars 3 forks source link

Unitigs missing details on the lane they are found in #27

Open rgladstone opened 9 months ago

rgladstone commented 9 months ago

I ran

unitig-caller --call --refs assembly_paths.txt --out ZA_unitigs_v3 --write-graph --threads 32

Version 1.3.0. Where assembly_paths.txt had 1944 assembly paths, and the output had 903111 unitigs

Two unitigs had no lane details:

CCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACC | TTGTTCATAGTTCCATTATAGCAAAAAAAGGGCTCTATAATATTTGTAGTG | 15841_5_29:1

and

TAAAGAAGTCTCCGAAATTCCGCACTGAGCATCTTCTCCGAAAAAGGCCGCTAATGTGGCCTTTTTCTTTACCTGTGGTTCTCCGCCAAAATCCCAGCAAATTGCATCACCAAAGCTAAAAGCTTTCAGGGTTGTCTAAAAAGCACAAGACATAAGAGGAAGTGCGGTATTTTATAATCAAGCCCCCAAGAATTTACCATAACATCCGTTGCCCGCACCGCCTGAGACGCGTTCAGCGCGTTCCTGACGAAACCATGACAAAAACCACAACAAACCACCCCGGAACCCGTCAGAAACGCGCCTGTTAAATTTTAACGGCATGCATGACTATGCACCAGAATGACGCCATGCTCTTTTCACGCAAAAATCATCACCAGACGGGGAAAATCACCAGTGACCAGACAGGAATCCGCCGCCCTCAATATGGCCAAATTTATCCGCGCACAGACACTTCTCCTCCTTGAGCGGCTCGAGCAGATGGATCTGGATGAGGCTGCCGGCTGCTGTGAGCACCTGCACGATCAGGCCGAAGCGCTTTACGCCATGCTGAACGCACAGATAGGCGAGGAAAATGCGTGAAAATCGGTGAACGGGTGCGCAATTCAGTGCGCGGCCGTGAGGCGATGGCGGGGTGTCGGGGCGCAGCCCTGACCAGGGTATTTGTGATGCCGGCGCGTGCGCGGTATTACAAATGCACATCCTGTCCCGGAACGGACACCGGGAAACAGCAAAAAAAACCGGGCGGCACGCCCGGAACTCAATCAAGTTAGATTAGATTACTCTCACTCGTCCATAACAGCATCATGGAACGACGACCACCGTCCGTGACGGCCGCCTCGTTTAAGTATGGACAGAAATACAGAAAATGCTCAGGACGAAATGTAATGAATGCGAACGGATTCAAGAAATTCGAGCATGACAGTCCTTACGGCCGGTTCGGTTTCAGACAAAATCTGCCGGTATGCATCCAGCATCATGGCTCCGGCATCCCCTCCGGCACGCCGTAGCCAGACCGAAACAACGGACACAAGCAGGTGTCGCTCATCATCACTAAGAGTCATCAGGGCTCCGGAAGAAAAACCAAAC | GAGCACTTTTAATTTGGTGACTTGAGTTATGAGCCAGAATATTTGTTTGACTTGAACTT | 15841_2_75:1

I just wanted to understand why this might be, in case lane IDs are missing from some of the other unitigs seen in more than one isolate.

samhorsfield96 commented 9 months ago

Hi, would you be able to send across the assembly_paths.txt file, please?

rgladstone commented 9 months ago

assembly_paths.txt Sure thing, attached now.

samhorsfield96 commented 9 months ago

Hi Rebecca, if you could, would you be able to check the following, please:

As bifrost can merge k-mers across contig breaks, there is the possibility these unitigs are not present in any source genome.

rgladstone commented 9 months ago

Yes they are in the .gfa file and not in the assemblies so that makes sense, thanks for clarifying.

samhorsfield96 commented 9 months ago

Hi Rebecca, it seems the issue might be with Bifrost where some k-mers are not annotated with colours (https://github.com/pmelsted/bifrost/issues/73). I would suggest trying with an earlier version of Bifrost (e.g. v1.2.1) and see if this gives the same error.

rgladstone commented 8 months ago

Sorry for the delay! I have tried a conda env with unitig-caller 1.3.0 and bifrost 1.2.1, and that gave the same results. I noticed there was a fix for #29 so I tried replacing the bifrost.py code within unitig-caller 1.3.0 (bifrost 1.3.1) with the updated code and I still get the same results.