MDU-PHL / bohra

A pipeline for bioinformatics analysis of bacterial genomes
GNU General Public License v3.0
19 stars 4 forks source link

There is something wrong with your reference file. Valid file types are .fasta, .gbk, .fasta.gz, .gbk.gz. Please check your inputs and try again. #70

Open Vikash84 opened 1 year ago

Vikash84 commented 1 year ago

(bohra) [vsingh@vdl]$ bohra run -p full -i isolates.tab -r ./NC_012925.1.fasta.gz

[INFO:06/21/2023 11:52:12 AM] Bohra is being run in /home/vsingh/vdl/Ssuis/ by vsingh on 2023-06-21. [INFO:06/21/2023 11:52:12 AM] You are running bohra in full mode. [INFO:06/21/2023 11:52:12 AM] Job ID is set Bohra microbial genomics pipeline [INFO:06/21/2023 11:52:12 AM] Tyring to find your profile. [INFO:06/21/2023 11:52:12 AM] You are running bohra with the lcl profile. [INFO:06/21/2023 11:52:12 AM] You are running with conda - wise decision!! Will now ensure that kraken DB is configured properly. [INFO:06/21/2023 11:52:12 AM] Searching for kraken2 DB: $KRAKEN2_DEFAULT_DB [INFO:06/21/2023 11:52:12 AM] You are using the default kraken2 database at : /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312/ [INFO:06/21/2023 11:52:12 AM] Checking that /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312 is a directory, checking that files are not empty [INFO:06/21/2023 11:52:12 AM] Found /home/vsingh/softwares/minikrake2_db/minikraken_8GB_20200312, checking that files are not empty [INFO:06/21/2023 11:52:12 AM] Congratulations your kraken database is present and all files are present. [INFO:06/21/2023 11:52:12 AM] Now looking for MLST setup [INFO:06/21/2023 11:52:12 AM] Checking mlst setup. [WARNING:06/21/2023 11:52:12 AM] You do not have mlst databases pre-configured the default DB with your installation of mlst will be used. [INFO:06/21/2023 11:52:12 AM] Found isolates.tab. [INFO:06/21/2023 11:52:12 AM] File isolates.tab is in correct format. [INFO:06/21/2023 11:52:12 AM] No valid contigs file has been supplied. Assemblies will be generated. [INFO:06/21/2023 11:52:12 AM] Found NC_012925.1.fasta.gz. [INFO:06/21/2023 11:52:12 AM] Reference ./NC_012925.1.fasta.gz has been found. Will now copy to running directory. [INFO:06/21/2023 11:52:12 AM] The file : NC_012925.1.fasta.gz already exists in the current directory [INFO:06/21/2023 11:52:12 AM] Checking if reference is a valid reference file. [CRITICAL:06/21/2023 11:52:12 AM] There is something wrong with your reference file. Valid file types are .fasta, .gbk, .fasta.gz, .gbk.gz. Please check your inputs and try again.

tetedange13 commented 1 year ago

Hi @Vikash84 ,

First, please note I am not an author of bohra (only a new user)

Have you tried re-downloading + re-GZIP your reference FASTA ? => Because I tried myself getting your NC_012925.1 (from GenBank entry >"Send to FASTA" then GZIPed it) => And bohra run well returned "Reference is in a valid format."

Otherwise, the code responsible for your error is the following : https://github.com/MDU-PHL/bohra/blob/5ff3a46782fc338a6205b8f388d4922cdec78a60/bohra/SnpDetection.py#L232-L239

So you have this error due to any2fastq {ref} command returning an error (= returned value different from "0") => Maybe try to use any2fasta yourself on your reference file, to see why it ends with an error

Hope this helps ! Have a nice day, Felix.

kristyhoran commented 1 year ago

@Vikash84 @tetedange13 has found the spot where the error is impacting you. The any2fasta is designed to check that the file is in a format that will be acceptable to the dependencies of the pipeline.. so if this is failing it indicates that there may be an issue with the file. Please let me know if you have any other issues

Cheers Kristy

tseemann commented 3 months ago

@Vikash84 did you get it working in the end?

If you think any2fasta is the problem, file an issue here: https://github.com/tseemann/any2fasta