bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
Apache License 2.0
86 stars 17 forks source link

Generating jaccard distances per kmer #284

Closed JLC2141 closed 9 months ago

JLC2141 commented 9 months ago

popunk version: 2.6.0

I am attempting to re-create the poppunk_sketch jaccard distance table as shown in this previous issue:

However, I am unable to use poppunk_sketch in my current version of poppunk. My current workaround is as follows:

sketchlib sketch -l files.txt -o database -s 1000 -k 15,30,3 --cpus 40 sketchlib query jaccard database -o dists --cpus 40 --distances dists --output

Where the output from in the "Core" and "Accessory" columns appears to be the jaccard distances for the first two kmers of kseq specified in the "sketchlib sketch" function.

Is there a simpler approach to output a table of jaccard distances per kmer?

JLC2141 commented 9 months ago

Here's some additional information: pp-sketchlib v2.1.1

Installations: Poppunk Install Conda create --name poppunk conda activate poppunk python3 -mpip install poppunk

pp-sketchlib Install sudo apt install cmake gfortran libarmadillo-dev libeigen3-dev libopenblas-dev pip3 install pp-sketchlib

johnlees commented 9 months ago

Have you tried just omitting the output of the query step:

sketchlib sketch -l files.txt -o database -s 1000 -k 15,30,3 --cpus 40
sketchlib query jaccard database --cpus 40 >
JLC2141 commented 9 months ago

Thank you