bacpop / ska.rust

Split k-mer analysis – version 2
https://docs.rs/ska/latest/ska/
Apache License 2.0
70 stars 4 forks source link

'Palindrome middle base not W/S: 78' in ska build #48

Closed jvfe closed 1 year ago

jvfe commented 1 year ago

Hi,

I've downloaded Gubbins into a conda environment and I've tried to run it with some different datasets, but all of them result in the same issue when trying to run generate_ska_alignment.py:

SKA: Split K-mer Analysis (the alignment-free aligner)
████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/12thread 'main' panicked at 'Palindrome middle base not W/S: 78', src/ska_dict.rs:87:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/jvfe/miniconda3/envs/recombination/bin/generate_ska_alignment.py", line 97, in <module>
    subprocess.check_output('ska build -o ' + args.out + ' -k ' + str(args.k) + \
  File "/home/jvfe/miniconda3/envs/recombination/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/jvfe/miniconda3/envs/recombination/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ska build -o out.aln -k 17 -f gubbins_samplesheet.txt --threads 1' returned non-zero exit status 101.

The error is the same when using only ska build (ska build -o out.aln -k 17 -f gubbins_samplesheet.txt --threads 1:

SKA: Split K-mer Analysis (the alignment-free aligner)
██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2/12thread 'main' panicked at 'Palindrome middle base not W/S: 78', src/ska_dict.rs:87:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Is this something related to the input dataset or something else? The input data consists of open assemblies (.fna) assembled through Unicycler.

Gubbins: v3.3.0 ska: v0.3.0

johnlees commented 1 year ago

This might be a bug. Could you please attach/send me the input file causing the issue (which looks like it would be the 2nd one in your input list)?

jvfe commented 1 year ago

Here you go: genome_2.zip

jvfe commented 1 year ago

I can confirm that if I shuffle the order of the genomes in the input list it still errors out on the second one. So it may not be related to that file specifically.

johnlees commented 1 year ago

That's good to know thanks, I will try and reproduce the error soon

johnlees commented 1 year ago

Indeed, this is a bug, thanks for reporting!

Problem is if a palindrome had already appeared twice it can be an N at input to the function, which I am not handling. Easy to fix, will release 0.3.2 with this fix soon.

jvfe commented 1 year ago

Great, thank you for fixing it!