Closed claudiacln87 closed 5 years ago
Hi, I believe there must be something in your input fasta files that breaks the database creation step. For example here someone reported a similar problem, seemingly due to having empty records/blank lines in the fasta file. Note that sequences of length < 3 will also result in 'empty' sequences since the CDS file is translated in wgd
. In general, I would recommend to filter the fasta file so that you only retain valid coding sequences before starting analyses using wgd
.
Let me know if this helps, Arthur
Hi Arthur, I tried with your suggestion. Now the .log file says that, I am not sure if it is working (unless that step requires a while):
2019-05-06 09:51:53: INFO makeblastdb: 2.2.30+
Package: blast 2.2.30, build Oct 27 2014 16:58:06
2019-05-06 09:51:54: INFO blastp: 2.2.30+
Package: blast 2.2.30, build Oct 27 2014 16:58:06
2019-05-06 09:51:54: INFO mcl 14-137
Copyright (c) 1999-2014, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
2019-05-06 09:51:54: INFO Output directory: wgd__mcl_blast_output_Frie_2 does not exist, will make it.
2019-05-06 09:51:54: INFO Writing blastdb sequences to db.fasta.
2019-05-06 09:51:54: INFO Writing query sequences to query.fasta.
2019-05-06 09:51:54: INFO Performing all-vs.-all Blastp (this might take a while)
2019-05-06 09:51:54: INFO Making Blastdb
Building a new DB, current time: 05/06/2019 09:51:54
New DB name: wgd__mcl_blast_output_Frie_2/37524d2d3fc116.db.fasta
New DB title: wgd__mcl_blast_output_Frie_2/37524d2d3fc116.db.fasta
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 18027 sequences in 1.24904 seconds.
2019-05-06 09:51:55: INFO Running Blastp
2019-05-06 09:51:55: INFO blastp -db wgd__mcl_blast_output_Frie_2/37524d2d3fc116.db.fasta -query wgd__mcl_blast_output_Frie_2/37524d2d46ce94.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 4 -out wgd__mcl_blast_output_Frie_2/F.endolithicus.cds.fasta.blast.tsv
Thanks, Claudia
Hi Claudia, yes this step can take a while (it is an all-versus.-all blastp search, blasting each gene to the full set of all genes). Hopely the analyses run without further troubles.
Best, Arthur
Hi Arthur, I am trying to use wgd tool to highly detect whole genome duplication in a black fungus, but I got an error during the step 1 (blast+ and mcl analysis) and I cannot get a .mcl file to proceed with analysis (I also tried with other genome sequences but got same error) Thank you very much in advance. I am using this command:
This is the error: