bacpop / unitig-caller

Methods to determine sequence element (unitig) presence/absence
Apache License 2.0
18 stars 3 forks source link

Unitig-caller produces different number of unitigs when compared to DBGWAS #25

Open kristinakordova opened 11 months ago

kristinakordova commented 11 months ago

I am running

unitig-caller --call --reads input_reads.txt --out output_folder --threads 76 --pyseer

and

./DBGWAS -strains input_strains.txt -keepNA -output output_folder -nb-cores 76

the two input files have the same assembled genomes and NA as phenotype. I was expecting to get an identical number of nodes in the graph but I am getting a mismatch of a few million - 2,251,639 (Uniting-caller) and 7,022,727 (DBGWAS). Does Uniting-caller have a filtering threshold? Where does the difference come from?

johnlees commented 11 months ago

To make the graph, DBGWAS uses GATB, unitig-caller uses bifrost -- I would not guarantee that these graphs are identical. I don't know if the default k-mer length of both tools is the same. If you wanted to compare more thoroughly, I would suggest running bifrost and bcalm on your dataset. I don't think that unitig caller should be doing any additional filtering.