B-UMMI / chewBBACA

BSR-Based Allele Calling Algorithm
GNU General Public License v3.0
134 stars 27 forks source link

AlleleCall issue - Determining BLASTp self-score for each representative... #195

Open daraneda96 opened 8 months ago

daraneda96 commented 8 months ago

Hi everyone, I am having trouble executing the AlleleCall command. Specifically, I am running the command:

chewBBACA.py AlleleCall -i /home/daniel.araneda/analisis_vibrios/genomas_mlst -g /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio -o /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall --cpu 10

And I get the following output in the .out file: "chewBBACA version: 3.3.3 Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez Github: https://github.com/B-UMMI/chewBBACA Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html Contacts: imm-bioinfo@medicina.ulisboa.pt

========================== chewBBACA - AlleleCall

Started at: 2024-03-06T01:17:41

Configuration values

Minimum sequence length: 0 Size threshold: 0.2 Translation table: 11 BLAST Score Ratio: 0.6 Word size: 5 Window size: 5 Clustering similarity: 0.2 Prodigal training file: /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/vibrio_trainingfile.trn CPU cores: 10 BLAST path: /home/daniel.araneda/miniconda3/envs/chewie/bin CDS input: False Prodigal mode: single Mode: 4 Number of inputs: 104 Number of loci: 60797 Intermediate files will be stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall/temp

Pre-computed data

Loci allele size mode values stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/loci_modes Hash tables stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/pre_computed

CDS prediction

Predicting CDSs for 104 inputs... [====================] 100% Extracted a total of 460028 CDSs from 104 inputs.

CDS deduplication

Identifying distinct CDSs... Identified 403038 distinct CDSs.

CDS exact matching

Searching for CDS exact matches... Found 68364 exact matches (60797 distinct schema alleles). Unclassified CDSs: 342241

CDS translation

Translating 342241 CDSs... [====================] 100% 204 CDSs could not be translated. Unclassified CDSs: 342037

Protein deduplication

Identifying distinct proteins... Identified 302610 distinct proteins.

Protein exact matching

Searching for Protein exact matches... Found 1301 exact matches (2264 distinct CDSs, 2592 total CDSs). Unclassified proteins: 301309

Protein clustering

Translating schema representative alleles... Determining BLASTp self-score for each representative..."

And I get the following output in the .err file: "[66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, ............, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10, 66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10]

What could be happening? Sorry if there's another issue explaining this. I looked to see if anyone else asked about it but couldn't find anything.

Greetings and thank you very much in advance.

Daniel

rfm-targa commented 8 months ago

Greetings @daraneda96,

Sorry for the delayed response. It looks like the error occurs after running BLASTp to determine the self-score for the schema loci representatives (the sequences inside the FASTA files in the short directory). I think it exits when it detects that some BLASTp processes failed to run. This might be happening if the sequence header size exceeds 50 characters. Can you please verify if any sequence headers in the FASTA files in the short directory have more than 50 characters? The sequence header size detected by BLAST is based on everything up to the first blank space. You can also know this if any locus in your schema has an identifier longer than 50 characters. The loci identifiers are based on the unique identifiers determined for the input genomes during schema creation (performed by the CreateSchema module). If any input genomes used to create the schema had a unique identifier, everything in the basename up to the first ., longer than 50 characters, it might lead to errors with BLAST.

Kind regards,

Rafael

daraneda96 commented 8 months ago

Greetings Rafael, I checked the sequence header with the "grep -E '^>.{50,}' *.fasta" command and i didn'y find anything. The largest sequence header has 37 characters. I could run Allelecall once, but it didn't work for me again. However, the file name was the same as it is now. Sorry for the vague information.

Kind regards,

Daniel

rfm-targa commented 8 months ago

Greetings @daraneda96,

Can you share some data to reproduce the issue? The data can include the schema and a set of genomes or just a minimal test case with part of the schema and a genome that allows us to get the same error and pinpoint the cause.

Kind regards,

Rafael

rfm-targa commented 7 months ago

Greetings @daraneda96,

We have updated chewBBACA to v3.3.4. This version includes some bug fixes. While the bug fixes do not target the issue you reported, it might be worth retrying with the new version.

Kind regards,

Rafael