aitgon / vtam

MIT License
3 stars 3 forks source link

taxassign bug #36

Open BanaIbrahim opened 8 months ago

BanaIbrahim commented 8 months ago

While running the default taxassign step, I kept getting the following error:

raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 255 from 'blastn -out /var/tmp/pbs.41724.cr2-pbs/tmpalxa3wez/RunnerBlast.py/blast_output.tsv -outfmt "6 qseqid sacc pident evalue qcovhsp staxids" -query /var/tmp/pbs.41724.cr2-pbs/tmpalxa3wez/RunnerTaxAssign.py/variant.fasta -db nt -evalue 1e-05 -qcov_hsp_perc 80 -num_threads 8 -dust yes', message 'FASTA-Reader: Title ends with at least 20 valid nucleotide characters.  Was the sequence accidentally put in the title line?'

After checking the RunnerTaxAssign.py script for the block of writing variant.fasta as shown below (line 57-60):

variant_fasta = os.path.join(self.this_temp_dir, 'variant.fasta')
        with open(variant_fasta, 'w') as fout:
            for seq in sequence_list:
                fout.write(">{}\n{}\n".format(seq, seq))

It looks that it's saving the ASV sequence in both header and sequence line, raising the above error while parsing fasta file.

BanaIbrahim commented 8 months ago

This might not be an issue with older versions of blast package but for the current NCBI nt database the compatible blast version has fasta format restriction and fails if the header has a sequence.