akcorut / kGWASflow

kGWASflow is a Snakemake workflow for performing k-mers-based GWAS.
https://github.com/akcorut/kGWASflow/wiki
MIT License
28 stars 8 forks source link

Increase k-mer size above 31 #31

Open VanOverbeeke opened 6 months ago

VanOverbeeke commented 6 months ago

Hi,

After finding some relevant consecutive 31-mers, we would like to repeat the workflow with k=41 in order to focus our search. The template config.yaml discourages this, and it becomes clear from downstream error messages (after successful kmc k-mer count steps) that there are some steps with hardcoded assumptions to use 32bits (see image). Would it be possible to increase this? What would be required for this?

Kind regards, Lennert afbeelding

akcorut commented 5 months ago

Hi @VanOverbeeke,

Sorry for my late response. Unfortunately, it is not possible to run kGWASflow with k-mer sizes bigger than 31. This is due to the underlying method used during the kmersGWAS step and it is not a very easy fix. However, k-mer size 31 should be able to capture pretty much everything that a k-mer size 41 would give you.

I hope this information will help. Let me know if you have any other questions/issues.

Thanks, Kivanc

VanOverbeeke commented 5 months ago

Hi Kivanc,

Thanks for the information. The question came up when we found 11 consecutive relevant k-mers, meaning the resulting 41-mer could also be found if we ran the workflow with k=41. But you're right, we found this signal using k=31 and it worked (very well) :) Keep up the good work!

Lennert

nikostr commented 3 months ago

If anyone else ends up with a similar request, kmdiff https://github.com/tlemane/kmdiff is a similar tool capable of handling larger kmer sizes.