Download GFF files of the 120 UHGG C. scindens genomes

In order to download the GFF files using the ftp links provided, curl was used.

After obtaining the GFF files, they were converted to fasta files using the script /storage1/data19/Scripts/python_scripts/Convert_GFF_to_Fasta.py

import sys

input_gff_file = sys.argv[1]
output_fasta_file = input_gff_file.replace(".gff", ".fasta")
test = False

with open(input_gff_file, "r") as input, open(output_fasta_file, "w") as output:
    for line in input:
        if "##FASTA" not in line and test == False:
            continue
        elif "##FASTA" in line:
            test = True
            continue

        if test == True:
            output.write(line)

To confirm accuracy of the obtained fasta files, a custom python script was used to count the number of scaffolds and the number of nucleotides in the file.

breister2 / Clostridium_scindens_mining

Download GFF files of the 120 UHGG C. scindens genomes #4