Transipedia / KaMRaT

11 stars 2 forks source link

unicity checking failed #2

Open mrtnb opened 5 months ago

mrtnb commented 5 months ago

Creating an index with KaMRaT for a set of kmer features results in the following error:

kamrat index -intab kmer_features_error.tab -outdir kamrat_output/ -klen 33 -unstrand
[ERROR] unicity checking failed, an equivalent key already existed for feature: CGTCATTAGGAGGGCTGAGAGGGCCCATGTTAG
Aborted (core dumped)

However, the feature is unique in the input table:

grep -n CGTCATTAGGAGGGCTGAGAGGGCCCATGTTAG kmer_features_error.tab
35245:CGTCATTAGGAGGGCTGAGAGGGCCCATGTTAG 5

I am using KaMRaT version 1.1.0 obtained from https://github.com/Transipedia/KaMRaT/archive/refs/tags/v1.1.0.tar.gz built using the provided instructions. Features were extracted by jellyfish count --mer-len=33 --canonical. I am attaching a minimal input file kmer_features_error.tab.gz (gzipped) to reproduce the error here. The original input file has about 18 million features, but the first 35250 features/lines are sufficient to reproduce the problem.

I have also encountered the same "unicity checking failed" error for a different k-mer feature at line 435621 of another input file. Could this be due to hash collisions?

hl-xue commented 5 months ago

Hi!

Thanks for using kamrat. I think the error may come from the usage with -klen 33. For now, kamrat only supports maximum k-mer length as 32 (or actually 31 because we suggest to use odd k-mer length). So, there might be some overflow issue that impedes correct calculation of the reverse complement hash code.

Please feel free to let me know if this is not the case, and thanks again for using kamrat!