Open daraneda96 opened 8 months ago
Greetings @daraneda96,
Sorry for the delayed response. It looks like the error occurs after running BLASTp to determine the self-score for the schema loci representatives (the sequences inside the FASTA files in the short
directory). I think it exits when it detects that some BLASTp processes failed to run. This might be happening if the sequence header size exceeds 50 characters. Can you please verify if any sequence headers in the FASTA files in the short
directory have more than 50 characters? The sequence header size detected by BLAST is based on everything up to the first blank space. You can also know this if any locus in your schema has an identifier longer than 50 characters. The loci identifiers are based on the unique identifiers determined for the input genomes during schema creation (performed by the CreateSchema module). If any input genomes used to create the schema had a unique identifier, everything in the basename up to the first .
, longer than 50 characters, it might lead to errors with BLAST.
Kind regards,
Rafael
Greetings Rafael, I checked the sequence header with the "grep -E '^>.{50,}' *.fasta" command and i didn'y find anything. The largest sequence header has 37 characters. I could run Allelecall once, but it didn't work for me again. However, the file name was the same as it is now. Sorry for the vague information.
Kind regards,
Daniel
Greetings @daraneda96,
Can you share some data to reproduce the issue? The data can include the schema and a set of genomes or just a minimal test case with part of the schema and a genome that allows us to get the same error and pinpoint the cause.
Kind regards,
Rafael
Greetings @daraneda96,
We have updated chewBBACA to v3.3.4. This version includes some bug fixes. While the bug fixes do not target the issue you reported, it might be worth retrying with the new version.
Kind regards,
Rafael
Hi everyone, I am having trouble executing the AlleleCall command. Specifically, I am running the command:
chewBBACA.py AlleleCall -i /home/daniel.araneda/analisis_vibrios/genomas_mlst -g /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio -o /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall --cpu 10
And I get the following output in the .out file: "chewBBACA version: 3.3.3 Authors: Rafael Mamede, Pedro Cerqueira, Mickael Silva, João Carriço, Mário Ramirez Github: https://github.com/B-UMMI/chewBBACA Documentation: https://chewbbaca.readthedocs.io/en/latest/index.html Contacts: imm-bioinfo@medicina.ulisboa.pt
========================== chewBBACA - AlleleCall
Started at: 2024-03-06T01:17:41
Configuration values
Minimum sequence length: 0 Size threshold: 0.2 Translation table: 11 BLAST Score Ratio: 0.6 Word size: 5 Window size: 5 Clustering similarity: 0.2 Prodigal training file: /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/vibrio_trainingfile.trn CPU cores: 10 BLAST path: /home/daniel.araneda/miniconda3/envs/chewie/bin CDS input: False Prodigal mode: single Mode: 4 Number of inputs: 104 Number of loci: 60797 Intermediate files will be stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/allelecall/temp
Pre-computed data
Loci allele size mode values stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/loci_modes Hash tables stored in /home/daniel.araneda/analisis_vibrios/mlst_ok/mlst_schema/schema_vibrio/pre_computed
CDS prediction
Predicting CDSs for 104 inputs... [====================] 100% Extracted a total of 460028 CDSs from 104 inputs.
CDS deduplication
Identifying distinct CDSs... Identified 403038 distinct CDSs.
CDS exact matching
Searching for CDS exact matches... Found 68364 exact matches (60797 distinct schema alleles). Unclassified CDSs: 342241
CDS translation
Translating 342241 CDSs... [====================] 100% 204 CDSs could not be translated. Unclassified CDSs: 342037
Protein deduplication
Identifying distinct proteins... Identified 302610 distinct proteins.
Protein exact matching
Searching for Protein exact matches... Found 1301 exact matches (2264 distinct CDSs, 2592 total CDSs). Unclassified proteins: 301309
Protein clustering
Translating schema representative alleles... Determining BLASTp self-score for each representative..."
And I get the following output in the .err file: "[66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, ............, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10, 66, 76, 65, 83, 84, 32, 68, 97, 116, 97, 98, 97, 115, 101, 32, 101, 114, 114, 111, 114, 58, 32, 78, 111, 32, 97, 108, 105, 97, 115, 32, 111, 114, 32, 105, 110, 100, 101, 120, 32, 102, 105, 108, 101, 32, 102, 111, 117, 110, 100, 32, 102, 111, 114, 32, 112, 114, 111, 116, 101, 105, 110, 32, 100, 97, 116, 97, 98, 97, 115, 101, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 47, 97, 108, 108, 101, 108, 101, 99, 97, 108, 108, 47, 116, 101, 109, 112, 47, 51, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 47, 115, 101, 108, 102, 95, 115, 99, 111, 114, 101, 115, 47, 66, 76, 65, 83, 84, 112, 95, 100, 98, 47, 108, 111, 99, 105, 95, 116, 111, 95, 99, 97, 108, 108, 95, 116, 114, 97, 110, 115, 108, 97, 116, 101, 100, 95, 114, 101, 112, 114, 101, 115, 101, 110, 116, 97, 116, 105, 118, 101, 115, 93, 32, 105, 110, 32, 115, 101, 97, 114, 99, 104, 32, 112, 97, 116, 104, 32, 91, 47, 104, 111, 109, 101, 47, 100, 97, 110, 105, 101, 108, 46, 97, 114, 97, 110, 101, 100, 97, 47, 97, 110, 97, 108, 105, 115, 105, 115, 95, 118, 105, 98, 114, 105, 111, 115, 47, 109, 108, 115, 116, 95, 111, 107, 58, 58, 93, 10]
What could be happening? Sorry if there's another issue explaining this. I looked to see if anyone else asked about it but couldn't find anything.
Greetings and thank you very much in advance.
Daniel