OLC-Bioinformatics / ConFindr

Intra-species bacterial contamination detection
https://olc-bioinformatics.github.io/ConFindr/
MIT License
22 stars 8 forks source link

Issues setting up DB #41

Closed juliofdiaz closed 2 years ago

juliofdiaz commented 2 years ago

Dear Confindr team,

I’m trying to install database for confindr (v. 0.7.4, python=3.6), but I am running into some problems. Specifically, I am getting the following error:

...
  2022-09-12 10:20:56  Downloading BACT000065...
  2022-09-12 10:21:05  Downloading rMLST profiles...
  2022-09-12 10:21:05  Combining rMLST files...
Traceback (most recent call last):
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/bin/confindr_database_setup", line 10, in <module>
    sys.exit(main())
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/lib/python3.6/site-packages/confindr_src/database_setup.py", line 270, in main
    args.secret_file)
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr2/lib/python3.6/site-packages/confindr_src/database_setup.py", line 209, in setup_confindr_database
    record.seq._data = record.seq._data.replace('-', '').replace('N', '')
TypeError: a bytes-like object is required, not 'str'

The code I’m using to setup the db is : confindr_database_setup -s key -o confindr_db

Have you seen this error before, or do you have any clues as to how to solve it? Thanks in advance

pcrxn commented 2 years ago

Hi @juliofdiaz,

This issue is the same as #27—you can either change your BioPython version to 1.68, or perform the manual fix as described in https://github.com/OLC-Bioinformatics/ConFindr/issues/27#issuecomment-952268919.

Edit: Previously referenced the wrong issue.

juliofdiaz commented 2 years ago

Thank you @pcrxn It seems like the manual fix worked, but I was only able to download the Escherichia, Salmonella, and Listeria dbs. Here is how I am running the setup:

$ confindr_database_setup -s key -o confindr_db
  2022-09-21 02:19:23  Downloading cgMLST-derived data for Escherichia, Salmonella, and Listeria... 
Visit this URL in your browser: http://pubmlst.org/cgi-bin/bigsdb/bigsdb.pl?db=pubmlst_rmlst_seqdef&page=authorizeClient&oauth_token=QwTxHjRaQ3xC6wvPjA5TqdCtzA0FHK8H
Enter oauth_verifier from browser: QoauJved
  2022-09-21 02:19:41  Downloading BACT000001... 
  2022-09-21 02:19:56  Downloading BACT000002... 
  2022-09-21 02:20:04  Downloading BACT000003... 
  2022-09-21 02:20:12  Downloading BACT000004... 
  2022-09-21 02:20:20  Downloading BACT000005... 
  2022-09-21 02:20:26  Downloading BACT000006... 
  2022-09-21 02:20:29  Downloading BACT000007... 
  2022-09-21 02:20:34  Downloading BACT000008... 
  2022-09-21 02:20:39  Downloading BACT000009... 
  2022-09-21 02:20:44  Downloading BACT000010... 
  2022-09-21 02:20:47  Downloading BACT000011... 
  2022-09-21 02:20:51  Downloading BACT000012... 
  2022-09-21 02:20:55  Downloading BACT000013... 
  2022-09-21 02:20:59  Downloading BACT000014... 
  2022-09-21 02:21:02  Downloading BACT000015... 
  2022-09-21 02:21:05  Downloading BACT000016... 
  2022-09-21 02:21:09  Downloading BACT000017... 
  2022-09-21 02:21:11  Downloading BACT000018... 
  2022-09-21 02:21:14  Downloading BACT000019... 
  2022-09-21 02:21:17  Downloading BACT000020... 
  2022-09-21 02:21:20  Downloading BACT000021... 
  2022-09-21 02:21:22  Downloading BACT000030... 
  2022-09-21 02:21:30  Downloading BACT000031... 
  2022-09-21 02:21:40  Downloading BACT000032... 
  2022-09-21 02:21:47  Downloading BACT000033... 
  2022-09-21 02:21:53  Downloading BACT000034... 
  2022-09-21 02:21:59  Downloading BACT000035... 
  2022-09-21 02:22:06  Downloading BACT000036... 
  2022-09-21 02:22:09  Downloading BACT000038... 
  2022-09-21 02:22:15  Downloading BACT000039... 
  2022-09-21 02:22:20  Downloading BACT000040... 
  2022-09-21 02:22:25  Downloading BACT000042... 
  2022-09-21 02:22:30  Downloading BACT000043... 
  2022-09-21 02:22:34  Downloading BACT000044... 
  2022-09-21 02:22:39  Downloading BACT000045... 
  2022-09-21 02:22:43  Downloading BACT000046... 
  2022-09-21 02:22:47  Downloading BACT000047... 
  2022-09-21 02:22:51  Downloading BACT000048... 
  2022-09-21 02:22:55  Downloading BACT000049... 
  2022-09-21 02:23:00  Downloading BACT000050... 
  2022-09-21 02:23:03  Downloading BACT000051... 
  2022-09-21 02:23:06  Downloading BACT000052... 
  2022-09-21 02:23:10  Downloading BACT000053... 
  2022-09-21 02:23:13  Downloading BACT000056... 
  2022-09-21 02:23:16  Downloading BACT000057... 
  2022-09-21 02:23:18  Downloading BACT000058... 
  2022-09-21 02:23:20  Downloading BACT000059... 
  2022-09-21 02:23:22  Downloading BACT000060... 
  2022-09-21 02:23:25  Downloading BACT000061... 
  2022-09-21 02:23:27  Downloading BACT000062... 
  2022-09-21 02:23:29  Downloading BACT000063... 
  2022-09-21 02:23:30  Downloading BACT000064... 
  2022-09-21 02:23:32  Downloading BACT000065... 
  2022-09-21 02:23:42  Downloading rMLST profiles... 
  2022-09-21 02:23:42  Combining rMLST files... 
  2022-09-21 02:25:34  Assigning alleles to genera... 
  2022-09-21 02:29:42  Downloading mash refseq sketch... 
  2022-09-21 02:29:43  Done downloading ConFindr databases! 

$ ls confindr_db
Escherichia_db_cgderived.fasta  Listeria_db_cgderived.fasta  Salmonella_db_cgderived.fasta  download_date.txt  gene_allele.txt  profiles.txt  rMLST_combined.fasta  refseq.msh

It seems I have only downloaded the Escherichia, Listeria, and Salmonella dbs. I am interested in the Mycobacterium one, and it seems I am registered for it:

Screenshot 2022-09-21 at 02 45 56

I couldn't find additional information on downloading additional dbs, so I assume I'm missing something (probably obvious).

pcrxn commented 2 years ago

Hi @juliofdiaz,

Only alleles for Escherichia, Listeria, and Salmonella are downloaded by default. If you use ConFindr to analyze a sample of a different genus, alleles will be automatically downloaded for that other genus and saved in your database path, including for Mycobacterium.

juliofdiaz commented 2 years ago

Thank you Liam. I reran confindr, and it did download the Mycobacterium db (Mycobacterium_db.fasta). Confindr did run into a problem trying to run bbduk.sh . If this is not related to the original question, I can raise a different isuue.

  2022-09-22 01:20:36  Welcome to ConFindr 0.7.4! Beginning analysis of your samples... 
  2022-09-22 01:20:36  Beginning analysis of sample DRR019435... 
  2022-09-22 01:20:36  Checking for cross-species contamination... 
  2022-09-22 01:22:41  Extracting conserved core genes... 
  2022-09-22 01:22:46  Encountered error when attempting to run ConFindr on sample DRR019435. Skipping... 
  2022-09-22 01:22:46  Error encounted was:
Traceback (most recent call last):
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/confindr.py", line 1067, in confindr
    fasta=args.fasta)
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/confindr.py", line 638, in find_contamination
    returncmd=True)
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/wrappers/bbtools.py", line 258, in bbduk_bait
    out, err = run_subprocess(cmd)
  File "/well/aanensen/users/afk289/conda/skylake/envs/confindr3/lib/python3.5/site-packages/confindr_src/wrappers/bbtools.py", line 16, in run_subprocess
    raise subprocess.CalledProcessError(x.returncode, cmd=command)
subprocess.CalledProcessError: Command 'bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2' returned non-zero exit status 1

  2022-09-22 01:22:46  Contamination detection complete! 
pcrxn commented 2 years ago

No problem, @juliofdiaz!

After you receive the bbduk.sh error, could you please run the following command and share the output?

bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2

juliofdiaz commented 2 years ago

Here is my output:

(confindr3) [afk289@rescomp1 scripts]$ bbduk.sh in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2
java -ea -Xmx1500m -Xms1500m -cp /well/aanensen/users/afk289/conda/skylake/envs/confindr3/opt/bbmap-39.00-0/current/ jgi.BBDuk in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta Xmx=1500m threads=2
Executing jgi.BBDuk [in=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_1.fastq.gz, in2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435_2.fastq.gz, outm=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R1.fastq.gz, outm2=/well/aanensen/projects/amr-landscape/confindr/mtuberculosis/DRR019435/DRR019435/rmlst_R2.fastq.gz, ref=/well/aanensen/projects/amr-landscape/confindr/confindr_db/Mycobacterium_db.fasta, Xmx=1500m, threads=2]
Version 39.00

Set threads to 2
0.187 seconds.
Initial:
Memory: max=1572m, total=1572m, free=1543m, used=29m

java.lang.Exception: 
An input file appears to be misformatted:
The character with ASCII code 39 appeared where a base was expected: '''
Sequence #0
Sequence ID: 'BACT000001_159'
Sequence: '[65, 84, 71, 67, 67, 71, 65, 71, 84, 67, 67, 67, 65, 67, 67, 71, 84, 67, 65, 67, 67, 84, 67, 71, 67, 67, 71, 67, 65, 65, 71, 84, 65, 71, 67, 67, 71, 84, 67, 65, 65, 67, 71, 65, 67, 65, 84, 65, 71, 71, 67, 84, 67, 84, 65, 71, 67, 71, 65, 71, 71, 65, 67, 84, 84, 84, 67, 84, 67, 71, 67, 67, 71, 67, 65, 65, 84, 65, 71, 65, 67, 65, 65, 65, 65, 67, 71, 65, 84, 67, 65, 65, 71, 84, 65, 67, 84, 84, 67, 65, 65, 67, 71, 65, 84, 71, 71, 67, 71, 65, 67, 65, 84, 67, 71, 84, 67, 71, 65, 65, 71, 71, 67, 65, 67, 67, 65, 84, 67, 71, 84, 67, 65, 65, 65, 71, 84, 71, 71, 65, 67, 67, 71, 71, 71, 65, 67, 71, 65, 71, 71, 84, 71, 67, 84, 67, 67, 84, 67, 71, 65, 67, 65, 84, 67, 71, 71, 67, 84, 65, 67, 65, 65, 71, 65, 67, 67, 71, 65, 65, 71, 71, 67, 71, 84, 71, 65, 84, 67, 67, 67, 67, 71, 67, 67, 67, 71, 67, 71, 65, 65, 67, 84, 71, 84, 67, 67, 65, 84, 67, 65, 65, 71, 67, 65, 67, 71, 65, 67, 71, 84, 67, 71, 65, 67, 67, 67, 67, 65, 65, 67, 71, 65, 71, 71, 84, 67, 71, 84, 84, 84, 67, 67, 71, 84, 67, 71, 71, 84, 71, 65, 67, 71, 65, 71, 71, 84, 67, 71, 65, 65, 71, 67, 67, 67, 84, 71, 71, 84, 71, 67, 84, 67, 65, 67, 67, 65, 65, 71, 71, 65, 71, 71, 65, 67, 65, 65, 65, 71, 65, 71, 71, 71, 67, 67, 71, 71, 67, 84, 67, 65, 84, 67, 67, 84, 67, 84, 67, 67, 65, 65, 71, 65, 65, 65, 67, 71, 67, 71, 67, 71, 67, 65, 71, 84, 65, 67, 71, 65, 71, 67, 71, 84, 71, 67, 67, 84, 71, 71, 71, 71, 67, 65, 67, 67, 65, 84, 67, 71, 65, 71, 71, 67, 71, 67, 84, 67, 65, 65, 71, 71, 65, 71, 65, 65, 71, 71, 65, 67, 71, 65, 71, 71, 67, 67, 71, 84, 67, 65, 65, 71, 71, 71, 67, 65, 67, 71, 71, 84, 67, 65, 84, 67, 71, 65, 71, 71, 84, 67, 71, 84, 67, 65, 65, 71, 71, 71, 84, 71, 71, 67, 67, 84, 71, 65, 84, 67, 67, 84, 67, 71, 65, 67, 65, 84, 67, 71, 71, 71, 67, 84, 71, 67, 71, 67, 71, 71, 84, 84, 84, 67, 67, 84, 71, 67, 67, 67, 71, 67, 67, 84, 67, 71, 67, 84, 71, 71, 84, 71, 71, 65, 71, 65, 84, 71, 67, 71, 67, 67, 71, 71, 71, 84, 71, 67, 71, 67, 71, 65, 67, 67, 84, 71, 67, 65, 71, 67, 67, 67, 84, 65, 67, 65, 84, 67, 71, 71, 67, 65, 65, 71, 71, 65, 71, 65, 84, 67, 71, 65, 71, 71, 67, 67, 65, 65, 71, 65, 84, 67, 65, 84, 67, 71, 65, 71, 67, 84, 71, 71, 65, 67, 65, 65, 71, 65, 65, 67, 67, 71, 67, 65, 65, 67, 65, 65, 67, 71, 84, 71, 71, 84, 71, 67, 84, 71, 84, 67, 67, 67, 71, 84, 67, 71, 67, 71, 67, 67, 84, 71, 71, 67, 84, 71, 71, 65, 71, 67, 65, 71, 65, 67, 67, 67, 65, 71, 84, 67, 67, 71, 65, 71, 71, 84, 71, 67, 71, 67, 65, 71, 67, 71, 65, 71, 84, 84, 67, 67, 84, 71, 65, 65, 84, 65, 65, 67, 84, 84, 71, 67, 65, 65, 65, 65, 65, 71, 71, 67, 65, 67, 67, 65, 84, 67, 67, 71, 65, 65, 65, 71, 71, 71, 84, 71, 84, 67, 71, 84, 71, 84, 67, 67, 84, 67, 71, 65, 84, 67, 71, 84, 67, 65, 65, 67, 84, 84, 67, 71, 71, 67, 71, 67, 71, 84, 84, 67, 71, 84, 67, 71, 65, 84, 67, 84, 67, 71, 71, 67, 71, 71, 84, 71, 84, 71, 71, 65, 67, 71, 71, 84, 67, 84, 71, 71, 84, 71, 67, 65, 84, 71, 84, 67, 84, 67, 67, 71, 65, 71, 67, 84, 65, 84, 67, 71, 84, 71, 71, 65, 65, 71, 67, 65, 67, 65, 84, 67, 71, 65, 67, 67, 65, 67, 67, 67, 71, 84, 67, 67, 71, 65, 71, 71, 84, 71, 71, 84, 67, 67, 65, 71, 71, 84, 84, 71, 71, 84, 71, 65, 67, 71, 65, 71, 71, 84, 67, 65, 67, 67, 71, 84, 67, 71, 65, 71, 71, 84, 71, 67, 84, 67, 71, 65, 67, 71, 84, 67, 71, 65, 67, 65, 84, 71, 71, 65, 67, 67, 71, 84, 71, 65, 71, 67, 71, 71, 71, 84, 84, 84, 67, 71, 84, 84, 71, 84, 67, 65, 67, 84, 67, 65, 65, 71, 71, 67, 71, 65, 67, 84, 67, 65, 71, 71, 65, 65, 71, 65, 67, 67, 67, 71, 84, 71, 71, 67, 71, 71, 67, 65, 67, 84, 84, 67, 71, 67, 67, 67, 71, 67, 65, 67, 84, 67, 65, 67, 71, 67, 71, 65, 84, 67, 71, 71, 71, 67, 65, 71, 65, 84, 67, 71, 84, 71, 67, 67, 71, 71, 71, 67, 65, 65, 71, 71, 84, 67, 65, 67, 67, 65, 65, 71, 84, 84, 71, 71, 84, 84, 67, 67, 71, 84, 84, 67, 71, 71, 84, 71, 67, 65, 84, 84, 67, 71, 84, 67, 67, 71, 67, 71, 84, 67, 71, 65, 71, 71, 65, 71, 71, 71, 84, 65, 84, 67, 71, 65, 71, 71, 71, 67, 67, 84, 71, 71, 84, 71, 67, 65, 67, 65, 84, 67, 84, 67, 67, 71, 65, 71, 67, 84, 71, 71, 67, 67, 71, 65, 71, 67, 71, 84, 67, 65, 67, 71, 84, 67, 71, 65, 71, 71, 84, 71, 67, 67, 67, 71, 65, 84, 67, 65, 71, 71, 84, 71, 71, 84, 84, 71, 67, 67, 71, 84, 67, 71, 71, 67, 71, 65, 67, 71, 65, 67, 71, 67, 71, 65, 84, 71, 71, 84, 67, 65, 65, 71, 71, 84, 67, 65, 84, 67, 71, 65, 67, 65, 84, 67, 71, 65, 67, 67, 84, 71, 71, 65, 71, 67, 71, 67, 67, 71, 84, 67, 71, 71, 65, 84, 67, 84, 67, 71, 84, 84, 71, 84, 67, 71, 67, 84, 67, 65, 65, 71, 67, 65, 65, 71, 67, 67, 65, 65, 84, 71, 65, 71, 71, 65, 67, 84, 65, 67, 65, 67, 67, 71, 65, 71, 71, 65, 71, 84, 84, 67, 71, 65, 67, 67, 67, 71, 71, 67, 71, 65, 65, 71, 84, 65, 67, 71, 71, 67, 65, 84, 71, 71, 67, 67, 71, 65, 67, 65, 71, 84, 84, 65, 67, 71, 65, 67, 71, 65, 71, 67, 65, 71, 71, 71, 67, 65, 65, 67, 84, 65, 67, 65, 84, 67, 84, 84, 67, 67, 67, 67, 71, 65, 71, 71, 71, 67, 84, 84, 67, 71, 65, 84, 71, 67, 67, 71, 65, 65, 65, 67, 67, 65, 65, 67, 71, 65, 65, 84, 71, 71, 67, 84, 84, 71, 65, 71, 71, 71, 65, 84, 84, 67, 71, 65, 65, 65, 65, 71, 67, 65, 71, 67, 71, 67, 71, 67, 67, 71, 65, 65, 84, 71, 71, 71, 65, 65, 71, 67, 84, 67, 71, 71, 84, 65, 67, 71, 67, 67, 71, 65, 71, 71, 67, 67, 71, 65, 71, 67, 71, 67, 67, 71, 71, 67, 65, 67, 65, 65, 71, 65, 84, 71, 67, 65, 67, 65, 67, 67, 71, 67, 71, 67, 65, 71, 65, 84, 71, 71, 65, 71, 65, 65, 71, 84, 84, 67, 71, 67, 67, 71, 67, 67, 71, 67, 67, 71, 65, 71, 71, 67, 71, 71, 67, 84, 71, 71, 65, 67, 71, 67, 71, 71, 67, 71, 67, 71, 71, 65, 67, 71, 65, 84, 67, 65, 71, 84, 67, 71, 84, 67, 71, 71, 67, 67, 65, 71, 84, 65, 71, 67, 71, 67, 65, 67, 67, 71, 84, 67, 71, 71, 65, 65, 65, 65, 71, 65, 67, 67, 71, 67, 71, 71, 71, 84, 71, 71, 65, 84, 67, 65, 67, 84, 71, 71, 67, 67, 65, 71, 67, 71, 65, 67, 71, 67, 67, 67, 65, 71, 67, 84, 71, 71, 67, 71, 71, 67, 67, 67, 84, 71, 67, 71, 71, 71, 65, 65, 65, 65, 65, 67, 84, 67, 71, 67, 67, 71, 71, 67, 65, 71, 67, 71, 67, 84, 84, 71, 65, 39]
ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCGAGGACTTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCGAAGGCACCATCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCGAAGGCGTGATCCCCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCGTTTCCGTCGGTGACGAGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGCTCATCCTCTCCAAGAAACGCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCAAGGAGAAGGACGAGGCCGTCAAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCCTCGACATCGGGCTGCGCGGTTTCCTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCGACCTGCAGCCCTACATCGGCAAGGAGATCGAGGCCAAGATCATCGAGCTGGACAAGAACCGCAACAACGTGGTGCTGTCCCGTCGCGCCTGGCTGGAGCAGACCCAGTCCGAGGTGCGCAGCGAGTTCCTGAATAACTTGCAAAAAGGCACCATCCGAAAGGGTGTCGTGTCCTCGATCGTCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGTGGACGGTCTGGTGCATGTCTCCGAGCTATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGTCCAGGTTGGTGACGAGGTCACCGTCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTCGTTGTCACTCAAGGCGACTCAGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGGGCAGATCGTGCCGGGCAAGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGAGGGTATCGAGGGCCTGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATCAGGTGGTTGCCGTCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTCGGATCTCGTTGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGTACGGCATGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCGAAACCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCGAGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGGCGGCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCGCGGGTGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCAGCGCTTGA''

This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'
    at shared.KillSwitch.kill(KillSwitch.java:96)
    at stream.Read.validateCommonCase_branchless(Read.java:412)
    at stream.Read.validate(Read.java:115)
    at stream.Read.<init>(Read.java:77)
    at stream.Read.<init>(Read.java:50)
    at stream.FastaReadInputStream.generateRead(FastaReadInputStream.java:270)
    at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:184)
    at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:109)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:668)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:657)

For some reason, sequence BACT000001_159 incldes a b' at the begining and a ' at the end:


b'ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCG
AGGACTTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCG
AAGGCACCATCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCG
AAGGCGTGATCCCCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTCG
TTTCCGTCGGTGACGAGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGC
TCATCCTCTCCAAGAAACGCGCGCAGTACGAGCGTGCCTGGGGCACCATCGAGGCGCTCA
AGGAGAAGGACGAGGCCGTCAAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCC
TCGACATCGGGCTGCGCGGTTTCCTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCG
ACCTGCAGCCCTACATCGGCAAGGAGATCGAGGCCAAGATCATCGAGCTGGACAAGAACC
GCAACAACGTGGTGCTGTCCCGTCGCGCCTGGCTGGAGCAGACCCAGTCCGAGGTGCGCA
GCGAGTTCCTGAATAACTTGCAAAAAGGCACCATCCGAAAGGGTGTCGTGTCCTCGATCG
TCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGTGGACGGTCTGGTGCATGTCTCCGAGC
TATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGTCCAGGTTGGTGACGAGGTCACCG
TCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTCGTTGTCACTCAAGGCGACTC
AGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGGGCAGATCGTGCCGGGCA
AGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGAGGGTATCGAGGGCC
TGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATCAGGTGGTTGCCG
TCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTCGGATCTCGT
TGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGTACGGCA
TGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCCGAGGGCTTCGATGCCGAAA
CCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCG
AGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGG
CGGCTGGACGCGGCGCGGACGATCAGTCGTCGGCCAGTAGCGCACCGTCGGAAAAGACCG
CGGGTGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCA
GCGCTTGA'```
juliofdiaz commented 2 years ago

I tried this settings as mentioned in post #30, but the outcome was the same as the above:

(confindr3) [afk289@rescomp1 confindr_db]$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import Bio
>>> print(Bio.__version__)
1.78
pcrxn commented 2 years ago

Hi @juliofdiaz, after you changed your BioPython version to 1.78, did you delete the old rMLST files and re-download?

juliofdiaz commented 2 years ago

Deleting the old rMLST files and re-downloading did the trick. Here is how I set up the conda environment:

conda create -n confindr python=3.7.12
conda activate confindr
conda install -c conda-forge biopython=1.78
conda install confindr=0.7.4