OLC-Bioinformatics / ConFindr

Intra-species bacterial contamination detection
https://olc-bioinformatics.github.io/ConFindr/
MIT License
22 stars 8 forks source link

bbtools error #38

Closed annacorreia closed 1 year ago

annacorreia commented 2 years ago

Hi, I'm trying to run confindr on some Klebsiella fastq files but am having some problems with bbtools.

I started out using Confindr=0.7.4, mash=2.3 and Python 3.7.12.

Some things I have tried that haven't worked: - ammending line 209 of database_setup.py (this worked) - downgrading from biopython 1.79 to 1.78 - trimming fastq files with trimmomatic first - changing bbmap to = 38.91 - changing Klebsiella_db.fasta (details below) but now am having a problem with indexing

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$ confindr.py -i /home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files -o output_1 --rmlst 2022-06-27 11:03:41 Welcome to ConFindr 0.7.4! Beginning analysis of your samples... 2022-06-27 11:03:41 Beginning analysis of sample NK_H18_033_1... 2022-06-27 11:03:41 Checking for cross-species contamination... 2022-06-27 11:03:47 Extracting conserved core genes... 2022-06-27 11:03:49 Encountered error when attempting to run ConFindr on sample NK_H18_033_1. Skipping... 2022-06-27 11:03:49 Error encounted was: Traceback (most recent call last): File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr fasta=args.fasta) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 647, in find_contamination returncmd=True, threads=threads) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait out, err = run_subprocess(cmd) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess raise subprocess.CalledProcessError(x.returncode, cmd=command) subprocess.CalledProcessError: Command 'bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz outm=output_1/NK_H18_033_1/rmlst.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta threads=36' returned non-zero exit status 1.

2022-06-27 11:03:49 Beginning analysis of sample NK_H18_033_2... 2022-06-27 11:03:49 Checking for cross-species contamination... 2022-06-27 11:03:55 Extracting conserved core genes... 2022-06-27 11:03:55 Encountered error when attempting to run ConFindr on sample NK_H18_033_2. Skipping... 2022-06-27 11:03:55 Error encounted was: Traceback (most recent call last): File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr fasta=args.fasta) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 647, in find_contamination returncmd=True, threads=threads) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait out, err = run_subprocess(cmd) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess raise subprocess.CalledProcessError(x.returncode, cmd=command) subprocess.CalledProcessError: Command 'bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta threads=36' returned non-zero exit status 1.

2022-06-27 11:03:55 Contamination detection complete! (/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn3 original_files]$ bbduk.sh in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta java -ea -Xmx29615m -Xms29615m -cp /projects/js66/software/conda_envs/confindr_0.7.4/opt/bbmap-38.96-1/current/ jgi.BBDuk in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta Executing jgi.BBDuk [in=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_1.fastq.gz, in2=/home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files/NK_H18_033_2.fastq.gz, outm=output_1/NK_H18_033_2/rmlst_R1.fastq.gz, outm2=output_1/NK_H18_033_2/rmlst_R2.fastq.gz, ref=/home/acorreia/.confindr_db/Klebsiella_db.fasta] Version 38.96

0.288 seconds. Initial: Memory: max=31054m, total=31054m, free=31025m, used=29m

java.lang.Exception: An input file appears to be misformatted: The character with ASCII code 39 appeared where a base was expected: ''' Sequence #0 Sequence ID: 'BACT000001_2176' Sequence: '[98, 39, 65, 84, 71, 65, 67, 84, 71, 65, 65, 84, 67, 84, 84, 84, 84, 71, 67, 84, 67, 65, 65, 67, 84, 71, 84, 84, 84, 71, 65, 65, 71, 65, 65, 84, 67, 67, 84, 84, 65, 65, 65, 65, 71, 65, 65, 65, 84, 67, 71, 65, 65, 65, 67, 67, 67, 71, 67, 67, 67, 71, 71, 71, 84, 84, 67, 67, 65, 84, 67, 71, 84, 84, 67, 71, 84, 71, 71, 84, 71, 84, 84, 71, 84, 84, 71, 84, 84, 71, 67, 84, 65, 84, 67, 71, 65, 67, 65, 65, 65, 71, 65, 67, 71, 84, 65, 71, 84, 65, 67, 84, 71, 71, 84, 84, 71, 65, 67, 71, 67, 67, 71, 71, 84, 67, 84, 71, 65, 65, 65, 84, 67, 84, 71, 65, 71, 84, 67, 67, 71, 67, 67, 65, 84, 67, 67, 67, 71, 71, 67, 84, 71, 65, 71, 67, 65, 71, 84, 84, 67, 65, 65, 65, 65, 65, 67, 71, 67, 67, 67, 65, 71, 71, 71, 67, 71, 65, 71, 67, 84, 71, 71, 65, 65, 65, 84, 67, 67, 65, 71, 71, 84, 84, 71, 71, 84, 71, 65, 67, 71, 65, 65, 71, 84, 84, 71, 65, 67, 71, 84, 84, 71, 67, 84, 67, 84, 71, 71, 65, 84, 71, 67, 65, 71, 84, 65, 71, 65, 65, 71, 65, 67, 71, 71, 67, 84, 84, 67, 71, 71, 84, 71, 65, 65, 65, 67, 84, 67, 84, 71, 67, 84, 71, 84, 67, 67, 67, 71, 84, 71, 65, 71, 65, 65, 65, 71, 67, 84, 65, 65, 65, 67, 71, 84, 67, 65, 67, 71, 65, 65, 71, 67, 84, 84, 71, 71, 65, 84, 67, 65, 67, 71, 67, 84, 71, 71, 65, 65, 65, 65, 65, 71, 67, 84, 84, 65, 67, 71, 65, 65, 71, 65, 67, 71, 67, 84, 71, 65, 65, 65, 67, 84, 71, 84, 84, 65, 67, 67, 71, 71, 84, 71, 84, 84, 65, 84, 67, 65, 65, 67, 71, 71, 67, 65, 65, 65, 71, 84, 84, 65, 65, 65, 71, 71, 84, 71, 71, 67, 84, 84, 67, 65, 67, 84, 71, 84, 84, 71, 65, 71, 67, 84, 71, 65, 65, 67, 71, 71, 84, 65, 84, 84, 67, 71, 84, 71, 67, 71, 84, 84, 67, 67, 84, 71, 67, 67, 71, 71, 71, 84, 84, 67, 67, 67, 84, 71, 71, 84, 65, 71, 65, 67, 71, 84, 84, 67, 71, 84, 67, 67, 71, 71, 84, 71, 67, 71, 67, 71, 65, 67, 65, 67, 71, 67, 84, 71, 67, 65, 67, 67, 84, 71, 71, 65, 65, 71, 71, 67, 65, 65, 65, 71, 65, 71, 67, 84, 84, 71, 65, 71, 84, 84, 67, 65, 65, 65, 71, 84, 67, 65, 84, 67, 65, 65, 71, 67, 84, 71, 71, 65, 67, 67, 65, 71, 65, 65, 71, 67, 71, 84, 65, 65, 67, 65, 65, 67, 71, 84, 84, 71, 84, 84, 71, 84, 84, 84, 67, 84, 67, 71, 84, 67, 71, 84, 71, 67, 67, 71, 84, 84, 65, 84, 67, 71, 65, 65, 84, 67, 67, 71, 65, 65, 65, 65, 67, 65, 71, 67, 71, 67, 65, 71, 65, 71, 67, 71, 67, 71, 65, 84, 67, 65, 71, 67, 84, 71, 67, 84, 71, 71, 65, 65, 65, 65, 67, 67, 84, 71, 67, 65, 71, 71, 65, 65, 71, 71, 67, 65, 84, 71, 71, 65, 65, 71, 84, 67, 65, 65, 65, 71, 71, 84, 65, 84, 67, 71, 84, 84, 65, 65, 71, 65, 65, 67, 67, 84, 67, 65, 67, 84, 71, 65, 67, 84, 65, 67, 71, 71, 84, 71, 67, 65, 84, 84, 67, 71, 84, 84, 71, 65, 84, 67, 84, 71, 71, 71, 67, 71, 71, 67, 71, 84, 84, 71, 65, 67, 71, 71, 67, 67, 84, 71, 67, 84, 71, 67, 65, 67, 65, 84, 67, 65, 67, 67, 71, 65, 84, 65, 84, 71, 71, 67, 67, 84, 71, 71, 65, 65, 65, 67, 71, 67, 71, 84, 84, 65, 65, 71, 67, 65, 84, 67, 67, 71, 65, 71, 67, 71, 65, 65, 65, 84, 67, 71, 84, 65, 65, 65, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 65, 65, 65, 84, 67, 65, 67, 84, 71, 84, 84, 65, 65, 65, 71, 84, 71, 67, 84, 71, 65, 65, 71, 84, 84, 67, 71, 65, 67, 67, 71, 67, 71, 65, 71, 67, 71, 84, 65, 67, 67, 67, 71, 84, 71, 84, 65, 84, 67, 67, 67, 84, 71, 71, 71, 67, 67, 84, 71, 65, 65, 65, 67, 65, 71, 67, 84, 71, 71, 71, 67, 71, 65, 65, 71, 65, 84, 67, 67, 65, 84, 71, 71, 71, 84, 65, 71, 67, 84, 65, 84, 67, 71, 67, 84, 65, 65, 65, 67, 71, 84, 84, 65, 84, 67, 67, 71, 71, 65, 65, 71, 71, 84, 65, 67, 67, 65, 65, 65, 67, 84, 71, 65, 67, 67, 71, 71, 84, 67, 71, 67, 71, 84, 71, 65, 67, 67, 65, 65, 67, 67, 84, 71, 65, 67, 67, 71, 65, 67, 84, 65, 67, 71, 71, 67, 84, 71, 67, 84, 84, 67, 71, 84, 84, 71, 65, 65, 65, 84, 67, 71, 65, 65, 71, 65, 65, 71, 71, 67, 71, 84, 84, 71, 65, 65, 71, 71, 67, 67, 84, 71, 71, 84, 84, 67, 65, 67, 71, 84, 84, 84, 67, 67, 71, 65, 65, 65, 84, 71, 71, 65, 84, 84, 71, 71, 65, 67, 67, 65, 65, 67, 65, 65, 65, 65, 65, 67, 65, 84, 67, 67, 65, 67, 67, 67, 71, 84, 67, 67, 65, 65, 65, 71, 84, 84, 71, 84, 84, 65, 65, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 84, 84, 71, 84, 71, 71, 65, 65, 71, 84, 71, 65, 84, 71, 71, 84, 84, 67, 84, 71, 71, 65, 84, 65, 84, 67, 71, 65, 67, 71, 65, 65, 71, 65, 71, 67, 71, 84, 67, 71, 84, 67, 71, 84, 65, 84, 67, 84, 67, 67, 67, 84, 71, 71, 71, 84, 67, 84, 71, 65, 65, 71, 67, 65, 71, 84, 71, 67, 65, 65, 65, 84, 67, 84, 65, 65, 67, 67, 67, 65, 84, 71, 71, 67, 65, 71, 67, 65, 71, 84, 84, 67, 71, 67, 71, 71, 65, 65, 65, 67, 67, 67, 65, 67, 65, 65, 67, 65, 65, 71, 71, 71, 67, 71, 65, 67, 67, 71, 84, 71, 84, 84, 71, 65, 65, 71, 71, 84, 65, 65, 65, 65, 84, 67, 65, 65, 71, 84, 67, 84, 65, 84, 67, 65, 67, 84, 71, 65, 67, 84, 84, 67, 71, 71, 84, 65, 84, 67, 84, 84, 67, 65, 84, 67, 71, 71, 67, 67, 84, 71, 71, 65, 67, 71, 71, 67, 71, 71, 67, 65, 84, 67, 71, 65, 67, 71, 71, 67, 67, 84, 71, 71, 84, 84, 67, 65, 67, 67, 84, 71, 84, 67, 84, 71, 65, 67, 65, 84, 67, 84, 67, 67, 84, 71, 71, 65, 65, 67, 71, 84, 84, 71, 67, 65, 71, 71, 67, 71, 65, 65, 71, 65, 65, 71, 67, 65, 71, 84, 84, 67, 71, 84, 71, 65, 65, 84, 65, 67, 65, 65, 65, 65, 65, 65, 71, 71, 67, 71, 65, 67, 71, 65, 65, 65, 84, 67, 71, 67, 65, 71, 67, 65, 71, 84, 84, 71, 84, 84, 67, 84, 71, 67, 65, 71, 71, 84, 84, 71, 65, 67, 71, 67, 65, 71, 65, 71, 67, 71, 84, 71, 65, 71, 67, 71, 84, 65, 84, 67, 84, 67, 67, 67, 84, 71, 71, 71, 67, 71, 84, 84, 65, 65, 65, 67, 65, 71, 67, 84, 67, 71, 67, 71, 71, 65, 65, 71, 65, 84, 67, 67, 71, 84, 84, 67, 65, 65, 67, 65, 65, 67, 84, 65, 67, 71, 84, 84, 71, 67, 84, 67, 84, 71, 65, 65, 67, 65, 65, 71, 65, 65, 65, 71, 71, 67, 71, 67, 84, 65, 84, 67, 71, 84, 84, 71, 84, 84, 71, 71, 84, 65, 65, 65, 71, 84, 67, 65, 67, 84, 71, 67, 65, 71, 84, 84, 71, 65, 67, 71, 67, 84, 65, 65, 65, 71, 71, 67, 71, 67, 65, 65, 67, 67, 71, 84, 65, 71, 65, 65, 67, 84, 71, 71, 67, 84, 71, 65, 67, 71, 71, 67, 71, 84, 65, 71, 65, 65, 71, 71, 84, 84, 65, 67, 67, 84, 71, 67, 71, 84, 71, 67, 84, 84, 67, 84, 71, 65, 65, 71, 67, 65, 84, 67, 67, 67, 71, 84, 71, 65, 67, 67, 71, 67, 71, 84, 84, 71, 65, 65, 71, 65, 67, 71, 67, 65, 65, 67, 84, 67, 84, 71, 71, 84, 84, 67, 84, 71, 65, 71, 67, 71, 84, 84, 71, 71, 67, 71, 65, 67, 71, 65, 65, 71, 84, 84, 71, 65, 65, 71, 67, 71, 65, 65, 65, 84, 84, 67, 65, 67, 67, 71, 71, 67, 71, 84, 71, 71, 65, 84, 67, 71, 84, 65, 65, 71, 65, 65, 67, 67, 71, 67, 71, 84, 65, 71, 84, 71, 65, 71, 67, 67, 84, 71, 84, 67, 84, 71, 84, 65, 67, 71, 84, 71, 67, 71, 65, 65, 65, 71, 65, 67, 71, 65, 65, 71, 67, 71, 71, 65, 65, 71, 65, 65, 65, 65, 65, 71, 65, 67, 71, 67, 84, 65, 84, 67, 71, 67, 84, 65, 67, 67, 71, 84, 71, 65, 65, 67, 65, 65, 71, 67, 65, 71, 71, 65, 65, 71, 65, 67, 71, 67, 71, 65, 65, 67, 84, 84, 67, 84, 67, 67, 65, 65, 67, 65, 65, 67, 71, 67, 84, 65, 84, 71, 71, 67, 84, 71, 65, 65, 71, 67, 71, 84, 84, 67, 65, 65, 65, 71, 67, 65, 71, 67, 71, 65, 65, 65, 71, 71, 67, 71, 65, 71, 84, 65, 65, 39] b'ATGACTGAATCTTTTGCTCAACTGTTTGAAGAATCCTTAAAAGAAATCGAAACCCGCCCGGGTTCCATCGTTCGTGGTGTTGTTGTTGCTATCGACAAAGACGTAGTACTGGTTGACGCCGGTCTGAAATCTGAGTCCGCCATCCCGGCTGAGCAGTTCAAAAACGCCCAGGGCGAGCTGGAAATCCAGGTTGGTGACGAAGTTGACGTTGCTCTGGATGCAGTAGAAGACGGCTTCGGTGAAACTCTGCTGTCCCGTGAGAAAGCTAAACGTCACGAAGCTTGGATCACGCTGGAAAAAGCTTACGAAGACGCTGAAACTGTTACCGGTGTTATCAACGGCAAAGTTAAAGGTGGCTTCACTGTTGAGCTGAACGGTATTCGTGCGTTCCTGCCGGGTTCCCTGGTAGACGTTCGTCCGGTGCGCGACACGCTGCACCTGGAAGGCAAAGAGCTTGAGTTCAAAGTCATCAAGCTGGACCAGAAGCGTAACAACGTTGTTGTTTCTCGTCGTGCCGTTATCGAATCCGAAAACAGCGCAGAGCGCGATCAGCTGCTGGAAAACCTGCAGGAAGGCATGGAAGTCAAAGGTATCGTTAAGAACCTCACTGACTACGGTGCATTCGTTGATCTGGGCGGCGTTGACGGCCTGCTGCACATCACCGATATGGCCTGGAAACGCGTTAAGCATCCGAGCGAAATCGTAAACGTTGGCGACGAAATCACTGTTAAAGTGCTGAAGTTCGACCGCGAGCGTACCCGTGTATCCCTGGGCCTGAAACAGCTGGGCGAAGATCCATGGGTAGCTATCGCTAAACGTTATCCGGAAGGTACCAAACTGACCGGTCGCGTGACCAACCTGACCGACTACGGCTGCTTCGTTGAAATCGAAGAAGGCGTTGAAGGCCTGGTTCACGTTTCCGAAATGGATTGGACCAACAAAAACATCCACCCGTCCAAAGTTGTTAACGTTGGCGACGTTGTGGAAGTGATGGTTCTGGATATCGACGAAGAGCGTCGTCGTATCTCCCTGGGTCTGAAGCAGTGCAAATCTAACCCATGGCAGCAGTTCGCGGAAACCCACAACAAGGGCGACCGTGTTGAAGGTAAAATCAAGTCTATCACTGACTTCGGTATCTTCATCGGCCTGGACGGCGGCATCGACGGCCTGGTTCACCTGTCTGACATCTCCTGGAACGTTGCAGGCGAAGAAGCAGTTCGTGAATACAAAAAAGGCGACGAAATCGCAGCAGTTGTTCTGCAGGTTGACGCAGAGCGTGAGCGTATCTCCCTGGGCGTTAAACAGCTCGCGGAAGATCCGTTCAACAACTACGTTGCTCTGAACAAGAAAGGCGCTATCGTTGTTGGTAAAGTCACTGCAGTTGACGCTAAAGGCGCAACCGTAGAACTGGCTGACGGCGTAGAAGGTTACCTGCGTGCTTCTGAAGCATCCCGTGACCGCGTTGAAGACGCAACTCTGGTTCTGAGCGTTGGCGACGAAGTTGAAGCGAAATTCACCGGCGTGGATCGTAAGAACCGCGTAGTGAGCCTGTCTGTACGTGCGAAAGACGAAGCGGAAGAAAAAGACGCTATCGCTACCGTGAACAAGCAGGAAGACGCGAACTTCTCCAACAACGCTATGGCTGAAGCGTTCAAAGCAGCGAAAGGCGAGTAA''

This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk' at shared.KillSwitch.kill(KillSwitch.java:96) at stream.Read.validateCommonCase_branchless(Read.java:412) at stream.Read.validate(Read.java:115) at stream.Read.(Read.java:77) at stream.Read.(Read.java:50) at stream.FastaReadInputStream.generateRead(FastaReadInputStream.java:270) at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:184) at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:109) at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:668) at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657)

(/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 original_files]$ cd /home/acorreia/.confindr_db/ (/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$ ls download_date.txt gene_allele.txt Listeria_db_cgderived.fasta refseq.msh Salmonella_db_cgderived.fasta Escherichia_db_cgderived.fasta Klebsiella_db.fasta profiles.txt rMLST_combined.fasta (/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$ cat Klebsiella_db.fasta | head -30

BACT000001_2176 b'ATGACTGAATCTTTTGCTCAACTGTTTGAAGAATCCTTAAAAGAAATCGAAACCCGCC CGGGTTCCATCGTTCGTGGTGTTGTTGTTGCTATCGACAAAGACGTAGTACTGGTTGACG CCGGTCTGAAATCTGAGTCCGCCATCCCGGCTGAGCAGTTCAAAAACGCCCAGGGCGAGC TGGAAATCCAGGTTGGTGACGAAGTTGACGTTGCTCTGGATGCAGTAGAAGACGGCTTCG GTGAAACTCTGCTGTCCCGTGAGAAAGCTAAACGTCACGAAGCTTGGATCACGCTGGAAA AAGCTTACGAAGACGCTGAAACTGTTACCGGTGTTATCAACGGCAAAGTTAAAGGTGGCT TCACTGTTGAGCTGAACGGTATTCGTGCGTTCCTGCCGGGTTCCCTGGTAGACGTTCGTC CGGTGCGCGACACGCTGCACCTGGAAGGCAAAGAGCTTGAGTTCAAAGTCATCAAGCTGG ACCAGAAGCGTAACAACGTTGTTGTTTCTCGTCGTGCCGTTATCGAATCCGAAAACAGCG CAGAGCGCGATCAGCTGCTGGAAAACCTGCAGGAAGGCATGGAAGTCAAAGGTATCGTTA AGAACCTCACTGACTACGGTGCATTCGTTGATCTGGGCGGCGTTGACGGCCTGCTGCACA TCACCGATATGGCCTGGAAACGCGTTAAGCATCCGAGCGAAATCGTAAACGTTGGCGACG AAATCACTGTTAAAGTGCTGAAGTTCGACCGCGAGCGTACCCGTGTATCCCTGGGCCTGA AACAGCTGGGCGAAGATCCATGGGTAGCTATCGCTAAACGTTATCCGGAAGGTACCAAAC TGACCGGTCGCGTGACCAACCTGACCGACTACGGCTGCTTCGTTGAAATCGAAGAAGGCG TTGAAGGCCTGGTTCACGTTTCCGAAATGGATTGGACCAACAAAAACATCCACCCGTCCA AAGTTGTTAACGTTGGCGACGTTGTGGAAGTGATGGTTCTGGATATCGACGAAGAGCGTC GTCGTATCTCCCTGGGTCTGAAGCAGTGCAAATCTAACCCATGGCAGCAGTTCGCGGAAA CCCACAACAAGGGCGACCGTGTTGAAGGTAAAATCAAGTCTATCACTGACTTCGGTATCT TCATCGGCCTGGACGGCGGCATCGACGGCCTGGTTCACCTGTCTGACATCTCCTGGAACG TTGCAGGCGAAGAAGCAGTTCGTGAATACAAAAAAGGCGACGAAATCGCAGCAGTTGTTC TGCAGGTTGACGCAGAGCGTGAGCGTATCTCCCTGGGCGTTAAACAGCTCGCGGAAGATC CGTTCAACAACTACGTTGCTCTGAACAAGAAAGGCGCTATCGTTGTTGGTAAAGTCACTG CAGTTGACGCTAAAGGCGCAACCGTAGAACTGGCTGACGGCGTAGAAGGTTACCTGCGTG CTTCTGAAGCATCCCGTGACCGCGTTGAAGACGCAACTCTGGTTCTGAGCGTTGGCGACG AAGTTGAAGCGAAATTCACCGGCGTGGATCGTAAGAACCGCGTAGTGAGCCTGTCTGTAC GTGCGAAAGACGAAGCGGAAGAAAAAGACGCTATCGCTACCGTGAACAAGCAGGAAGACG CGAACTTCTCCAACAACGCTATGGCTGAAGCGTTCAAAGCAGCGAAAGGCGAGTAA' BACT000002_86 (/projects/js66/software/conda_envs/confindr_0.7.4) [acorreia@m3-dtn4 .confindr_db]$

I then tried to remove the the the b' and ' at the beginning and end of each sequence but the error below happened:

confindr.py -i /home/acorreia/js66_scratch/anna/fastq_files/test_fastq/original_files -o example-out --rmlst 2022-06-28 03:40:45 Welcome to ConFindr 0.7.4! Beginning analysis of your samples... 2022-06-28 03:40:45 Beginning analysis of sample NK_H18_033_1... 2022-06-28 03:40:45 Checking for cross-species contamination... 2022-06-28 03:40:51 Extracting conserved core genes... 2022-06-28 03:40:58 Quality trimming... 2022-06-28 03:40:59 Detecting contamination... [E::fai_build_core] Different line length in sequence 'BACT000001_2176' Traceback (most recent call last): File "/projects/js66/software/conda_envs/confindr_0.7.4/bin/confindr.py", line 10, in sys.exit(main()) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1214, in main confindr(args) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 1067, in confindr fasta=args.fasta) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/confindr_src/confindr.py", line 691, in find_contamination pysam.faidx(sample_database) File "/projects/js66/software/conda_envs/confindr_0.7.4/lib/python3.7/site-packages/pysam/utils.py", line 75, in call stderr)) pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[faidx] Could not build fai index /home/acorreia/.confindr_db/Klebsiella_db.fasta.fai\n'

If you have any advice please, that would be great.

pcrxn commented 1 year ago

Hi @annacorreia—sorry for the delayed response! I've tested some Klebsiella genomes using ConFindr v0.7.4, with mash=2.3 and python=3.7.12, and the rMLST alleles for Klebsiella seem to be downloaded correctly and ConFindr runs without error.

Based upon the logs that you've included, it seems that the encoding of the Klebsiella rMLST alleles file (Klebsiella_db.fasta) became messed up somehow: if you delete all of the files Klebsiella_db* within the path provided to -d/--databases, and then re-run ConFindr, the Klebsiella alleles will be downloaded again and automatically re-indexed.

pcrxn commented 1 year ago

Related to #30, issue with the biopython version. If your biopython version has been downgraded to 1.78, the above instructions should work!

pcrxn commented 1 year ago

Closing as completed since a solution was provided.