bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
89 stars 18 forks source link

NOTE: NTP898 required densification ? #163

Closed dineshkumarsrk closed 3 years ago

dineshkumarsrk commented 3 years ago

PopPUNK 2.0.2 poppunk_sketch 1.7.0 I tried to create database for viral genomes by following command poppunk --create-db --threads 10 --output database20 --r-files list.txt --max-a-dist 1 --min-k 16 --sketch-size 10000 While running it shows unusual warning as shown below, NOTE: NTP898 required densification Even-though It created database20 without line fit plot and other files. I never faced this issue while working on bacterial genomes. I tried to google about required densification but end up with nothing. Kindly let me know details regarding above mentioned warning and also help me to get line-fit-plot

johnlees commented 3 years ago

The densification is needed when not all of the sketch bins have observations – you'll observe this when the genome length is on a similar order of magnitude to the sketch size. The algorithm is described here: https://dl.acm.org/doi/10.5555/3305890.3306007

It's not a problem, and expected with viral genomes.

With the plots I would recommend:

johnlees commented 3 years ago

Closing due to no updates