Run Blast on each genome from the Wilkinson dataset using the 31 reference genomes 16S sequences as the blast database

Blast was run on the 1200 genomes using the following command: for f in *.fa; do blastn -outfmt "6 qseqid sseqid qlen slen qstart qend sstart send length mismatch evalue bitscore pident" -num_threads 15 -perc_identity 97 -mt_mode 0 -query $f -db ../Clostridium_scindens_NCBI_reference_genomes_16S_Sequences/Clostridium_scindens_NCBI_reference_genomes_16S_sequences.fasta -out ${f%.*}_vs_Clostridium_scindens_16S_Reference.out; done &

Output files were combined using the following python script: Clean_Blast_Output.py

import sys

concatenated_input_file = sys.argv[1]
concatenated_output_file = concatenated_input_file.split(".tsv")[0] + "_cleaned.tsv"

first_output_line = "Query Sequence id\tSubject Sequence id\tQuery Sequence Length\tSubject Sequence Length\tStart of Alignment in Query\tEnd of Alignment in Query\tStart of Alignment in Subject\tEnd of Alignment in Subject\tAlignment Length\tNumber of Mismatches\tE-value\tBit Score\t Percent Identity"

with open(concatenated_input_file, "r") as input, open(concatenated_output_file, "w") as output:
    output.write(first_output_line + "\n")
    for line in input:
        if "#" in line:
            continue
        else:
            output.write(line)

breister2 / Clostridium_scindens_mining

Run Blast on each genome from the Wilkinson dataset using the 31 reference genomes 16S sequences as the blast database #1