bacpop / unitig-caller

Methods to determine sequence element (unitig) presence/absence
Apache License 2.0
18 stars 3 forks source link

Unitigs missing details on the lane they are found in #27

Open rgladstone opened 11 months ago

rgladstone commented 11 months ago

I ran

unitig-caller --call --refs assembly_paths.txt --out ZA_unitigs_v3 --write-graph --threads 32

Version 1.3.0. Where assembly_paths.txt had 1944 assembly paths, and the output had 903111 unitigs

Two unitigs had no lane details:

CCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACC | TTGTTCATAGTTCCATTATAGCAAAAAAAGGGCTCTATAATATTTGTAGTG | 15841_5_29:1

and

TAAAGAAGTCTCCGAAATTCCGCACTGAGCATCTTCTCCGAAAAAGGCCGCTAATGTGGCCTTTTTCTTTACCTGTGGTTCTCCGCCAAAATCCCAGCAAATTGCATCACCAAAGCTAAAAGCTTTCAGGGTTGTCTAAAAAGCACAAGACATAAGAGGAAGTGCGGTATTTTATAATCAAGCCCCCAAGAATTTACCATAACATCCGTTGCCCGCACCGCCTGAGACGCGTTCAGCGCGTTCCTGACGAAACCATGACAAAAACCACAACAAACCACCCCGGAACCCGTCAGAAACGCGCCTGTTAAATTTTAACGGCATGCATGACTATGCACCAGAATGACGCCATGCTCTTTTCACGCAAAAATCATCACCAGACGGGGAAAATCACCAGTGACCAGACAGGAATCCGCCGCCCTCAATATGGCCAAATTTATCCGCGCACAGACACTTCTCCTCCTTGAGCGGCTCGAGCAGATGGATCTGGATGAGGCTGCCGGCTGCTGTGAGCACCTGCACGATCAGGCCGAAGCGCTTTACGCCATGCTGAACGCACAGATAGGCGAGGAAAATGCGTGAAAATCGGTGAACGGGTGCGCAATTCAGTGCGCGGCCGTGAGGCGATGGCGGGGTGTCGGGGCGCAGCCCTGACCAGGGTATTTGTGATGCCGGCGCGTGCGCGGTATTACAAATGCACATCCTGTCCCGGAACGGACACCGGGAAACAGCAAAAAAAACCGGGCGGCACGCCCGGAACTCAATCAAGTTAGATTAGATTACTCTCACTCGTCCATAACAGCATCATGGAACGACGACCACCGTCCGTGACGGCCGCCTCGTTTAAGTATGGACAGAAATACAGAAAATGCTCAGGACGAAATGTAATGAATGCGAACGGATTCAAGAAATTCGAGCATGACAGTCCTTACGGCCGGTTCGGTTTCAGACAAAATCTGCCGGTATGCATCCAGCATCATGGCTCCGGCATCCCCTCCGGCACGCCGTAGCCAGACCGAAACAACGGACACAAGCAGGTGTCGCTCATCATCACTAAGAGTCATCAGGGCTCCGGAAGAAAAACCAAAC | GAGCACTTTTAATTTGGTGACTTGAGTTATGAGCCAGAATATTTGTTTGACTTGAACTT | 15841_2_75:1

I just wanted to understand why this might be, in case lane IDs are missing from some of the other unitigs seen in more than one isolate.

samhorsfield96 commented 11 months ago

Hi, would you be able to send across the assembly_paths.txt file, please?

rgladstone commented 11 months ago

assembly_paths.txt Sure thing, attached now.

samhorsfield96 commented 11 months ago

Hi Rebecca, if you could, would you be able to check the following, please:

As bifrost can merge k-mers across contig breaks, there is the possibility these unitigs are not present in any source genome.

rgladstone commented 10 months ago

Yes they are in the .gfa file and not in the assemblies so that makes sense, thanks for clarifying.

samhorsfield96 commented 10 months ago

Hi Rebecca, it seems the issue might be with Bifrost where some k-mers are not annotated with colours (https://github.com/pmelsted/bifrost/issues/73). I would suggest trying with an earlier version of Bifrost (e.g. v1.2.1) and see if this gives the same error.

rgladstone commented 10 months ago

Sorry for the delay! I have tried a conda env with unitig-caller 1.3.0 and bifrost 1.2.1, and that gave the same results. I noticed there was a fix for #29 so I tried replacing the bifrost.py code within unitig-caller 1.3.0 (bifrost 1.3.1) with the updated code and I still get the same results.