MatthewRalston / kmerdb

Python bioinformatics CLI for k-mer counts and de Bruijn graphs
https://matthewralston.github.io/kmerdb
Apache License 2.0
12 stars 1 forks source link

Kmer.shred always returns sparse kmer_id/count array #134

Open MatthewRalston opened 3 months ago

MatthewRalston commented 3 months ago

Needs to be documented. Especially because the edge list is now totally sparse.

MatthewRalston commented 3 months ago

But it doesn't?

MatthewRalston commented 3 months ago

I was backwards. It totally needs documentation. True.

It being the following.

The k-mer utility does not produce a sparse k-mer count array. Specifically said: the nullomers are present in the k-mer array and its length is 4^k. But, the fact that it was/wasn't could/couldn't be sparse, and what state the output is in, needs documentation (Linking #137 to document the k-mer array shape explicitly in the kmerdb profile readme section.