bacpop / ska.rust

Split k-mer analysis – version 2
https://docs.rs/ska/latest/ska/
Apache License 2.0
56 stars 4 forks source link

'Palindrome middle base not W/S: 78' in ska build #48

Closed jvfe closed 11 months ago

jvfe commented 11 months ago

Hi,

I've downloaded Gubbins into a conda environment and I've tried to run it with some different datasets, but all of them result in the same issue when trying to run generate_ska_alignment.py:

SKA: Split K-mer Analysis (the alignment-free aligner)
████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/12thread 'main' panicked at 'Palindrome middle base not W/S: 78', src/ska_dict.rs:87:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/jvfe/miniconda3/envs/recombination/bin/generate_ska_alignment.py", line 97, in <module>
    subprocess.check_output('ska build -o ' + args.out + ' -k ' + str(args.k) + \
  File "/home/jvfe/miniconda3/envs/recombination/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/jvfe/miniconda3/envs/recombination/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ska build -o out.aln -k 17 -f gubbins_samplesheet.txt --threads 1' returned non-zero exit status 101.

The error is the same when using only ska build (ska build -o out.aln -k 17 -f gubbins_samplesheet.txt --threads 1:

SKA: Split K-mer Analysis (the alignment-free aligner)
██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/12thread 'main' panicked at 'Palindrome middle base not W/S: 78', src/ska_dict.rs:87:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is this something related to the input dataset or something else? The input data consists of open assemblies (.fna) assembled through Unicycler.

Gubbins: v3.3.0 ska: v0.3.0

johnlees commented 11 months ago

This might be a bug. Could you please attach/send me the input file causing the issue (which looks like it would be the 2nd one in your input list)?

jvfe commented 11 months ago

Here you go: genome_2.zip

jvfe commented 11 months ago

I can confirm that if I shuffle the order of the genomes in the input list it still errors out on the second one. So it may not be related to that file specifically.

johnlees commented 11 months ago

That's good to know thanks, I will try and reproduce the error soon

johnlees commented 11 months ago

Indeed, this is a bug, thanks for reporting!

Problem is if a palindrome had already appeared twice it can be an N at input to the function, which I am not handling. Easy to fix, will release 0.3.2 with this fix soon.

jvfe commented 11 months ago

Great, thank you for fixing it!