galaxyproject / tpv-shared-database

A shared database of rules for Total Perspective Vortex used by the usegalaxy.* federation.
MIT License
3 stars 11 forks source link

jellyfish kmer counter memory estimation #53

Open bgruening opened 7 months ago

bgruening commented 7 months ago

It would be nice if we can convert the memory estimation of jellyfish to python.

https://github.com/gmarcais/Jellyfish/blob/43b1ab27abdb8c9399c386cc998bb9fd33648412/include/jellyfish/large_hash_array.hpp#L97

jellyfish mem is giving already a nice estimation.

jellyfish mem --mer-len 27 --size 100M  '/data/dnb09/galaxy_db/files/c/0/2/dataset_c02f0b7f-79d1-498d-ba17-c39374511657.dat'
149933428608 (139G)

the M after --size is Million, G for Billion, k for kilo

mr-c commented 7 months ago

k-h-mer has a Python accessible implementation of the same algorithm

bgruening commented 7 months ago

Thanks @mr-c

Are you talking about the mem part? It would be nice to have a simple Python function that can give us an estimate when we provide kmer-len and size. Any pointers where we can find that in the source?