breister2 / Clostridium_scindens_mining

0 stars 0 forks source link

Run Blast on each genome from the Wilkinson dataset using the 31 reference genomes 16S sequences as the blast database #1

Open breister2 opened 1 year ago

breister2 commented 1 year ago

Blast was run on the 1200 genomes using the following command: for f in *.fa; do blastn -outfmt "6 qseqid sseqid qlen slen qstart qend sstart send length mismatch evalue bitscore pident" -num_threads 15 -perc_identity 97 -mt_mode 0 -query $f -db ../Clostridium_scindens_NCBI_reference_genomes_16S_Sequences/Clostridium_scindens_NCBI_reference_genomes_16S_sequences.fasta -out ${f%.*}_vs_Clostridium_scindens_16S_Reference.out; done &

Output files were combined using the following python script: Clean_Blast_Output.py

import sys

concatenated_input_file = sys.argv[1]
concatenated_output_file = concatenated_input_file.split(".tsv")[0] + "_cleaned.tsv"

first_output_line = "Query Sequence id\tSubject Sequence id\tQuery Sequence Length\tSubject Sequence Length\tStart of Alignment in Query\tEnd of Alignment in Query\tStart of Alignment in Subject\tEnd of Alignment in Subject\tAlignment Length\tNumber of Mismatches\tE-value\tBit Score\t Percent Identity"

with open(concatenated_input_file, "r") as input, open(concatenated_output_file, "w") as output:
    output.write(first_output_line + "\n")
    for line in input:
        if "#" in line:
            continue
        else:
            output.write(line)