Closed jltsiren closed 1 year ago
Hi @jltsiren,
Yes, KFF API can be slow. I will have a look at the performances that point here. My goal is to improve the API speed while me or someone else is using it in projects that require performances. So, I'll be able to perform speed tests on real use cases.
For the CMake, I will have a look at the different versions in between 3.16 and 3.19
Hope that answer your questions. If you need other features, do not hesitate to contact me.
Thank you! Now that CMake version 3.16 works, I got the CI tests to run.
Branch read_contiguous
helped a bit, but the difference was pretty small. I guess the real issue is how the high-level nudges towards writing strictly sequential code that focuses on the general case. I have some ideas how to optimize the kmer handling code on our side. I'll return to the topic once I have a better understanding of the bottlenecks.
The optimization is now available on main and dev
I started using this API in vg (vgteam/vg#3844), and I noticed a couple of issues.
First, reading a KFF file with
Kff_reader
can be quite slow. For example, KMC outputs each kmer as a separate block by default. Even if you read the kmers block-by-block, you apparently end up doing tworead()
calls for each kmer. Buffering should solve the issue.Second,
CMakeLists.txt
says it requires CMake version 3.19, which is fairly new. For example, Ubuntu 20.04 LTS ships with CMake 3.16, which means that this API does not compile with the default tools found on many servers. The file seems fairly basic, so I believe it should work with older versions as well.I could try doing a PR myself, but I'm not confident I understand how the minimizer sections work, so I could accidentally end up breaking something.