Blast was run on the 1200 genomes using the following command:
for f in *.fa; do blastn -outfmt "6 qseqid sseqid qlen slen qstart qend sstart send length mismatch evalue bitscore pident" -num_threads 15 -perc_identity 97 -mt_mode 0 -query $f -db ../Clostridium_scindens_NCBI_reference_genomes_16S_Sequences/Clostridium_scindens_NCBI_reference_genomes_16S_sequences.fasta -out ${f%.*}_vs_Clostridium_scindens_16S_Reference.out; done &
Output files were combined using the following python script:
Clean_Blast_Output.py
import sys
concatenated_input_file = sys.argv[1]
concatenated_output_file = concatenated_input_file.split(".tsv")[0] + "_cleaned.tsv"
first_output_line = "Query Sequence id\tSubject Sequence id\tQuery Sequence Length\tSubject Sequence Length\tStart of Alignment in Query\tEnd of Alignment in Query\tStart of Alignment in Subject\tEnd of Alignment in Subject\tAlignment Length\tNumber of Mismatches\tE-value\tBit Score\t Percent Identity"
with open(concatenated_input_file, "r") as input, open(concatenated_output_file, "w") as output:
output.write(first_output_line + "\n")
for line in input:
if "#" in line:
continue
else:
output.write(line)
Blast was run on the 1200 genomes using the following command:
for f in *.fa; do blastn -outfmt "6 qseqid sseqid qlen slen qstart qend sstart send length mismatch evalue bitscore pident" -num_threads 15 -perc_identity 97 -mt_mode 0 -query $f -db ../Clostridium_scindens_NCBI_reference_genomes_16S_Sequences/Clostridium_scindens_NCBI_reference_genomes_16S_sequences.fasta -out ${f%.*}_vs_Clostridium_scindens_16S_Reference.out; done &
Output files were combined using the following python script:
Clean_Blast_Output.py