gbouras13 / dnaapler

Reorients assembled microbial sequences
MIT License
102 stars 3 forks source link

Dnaapler rotates non-circularized contigs #62

Closed MostafaYA closed 1 year ago

MostafaYA commented 1 year ago

Hi,

Thanks for this nice tool. I tried to automate Dnaapler in a pipeline to reorientate genomes assembled with flye. In some instances, I got the genome assembly fragmented, and obviously not circular (also from flye info). Despite of this, Dnaapler rotated these fragments and may have falsely merged the contig ends. Could it be possible that Dnaapler takes the flye assembly_info.txt and applies its logic only to the circularized contigs?

Thank you

MostafaYA commented 1 year ago

The --ignore option seems to solve the problem! I close this!!

gbouras13 commented 1 year ago

Hi @MostafaYA ,

No problems mate - thankfully this was suggested by Alex Weisberg.

A bit of a self plug, but I'd also give Hybracter https://github.com/gbouras13/hybracter a try as it should handle these cases for you automatically (I'd love some feedback!) :)

I also know Dnaapler will probably soon be in Dragonflye https://github.com/rpetit3/dragonflye - so that would be another good option to try. Just tagging you in here too @rpetit3 I'm not sure how you have handled non-circularised contigs but this concern is something to think about if you haven't thought of it!

George

MostafaYA commented 1 year ago

Hi, thanks for the great pipelines. Below is my code to auto-ignore the non-circular contigs

rule noncircular_contigs: 
    input:
        rules.copy_flye_data.output[1],
    output:
        results_per_sample_dir + "{sample}/4_genome_rotation/noncircular_contigs.tsv",
    conda:
        envs_folder + "csvtk.yaml"
    shell:
        """
        csvtk filter2 -t -C $ -f '$4== "N" ' {input} -o {output}
        """

rule rotate_genome:
    input:
        rotate_genome_input(),
        rules.noncircular_contigs.output[0],
    output:
        dir = temp(directory(results_per_sample_dir + "{sample}/tmp_rotation")),
        contigs= results_per_sample_dir + "{sample}/4_genome_rotation/{sample}.fasta",
        summary= results_per_sample_dir + "{sample}/4_genome_rotation/{sample}.reorientation_summary.tsv"
    threads: 16
    conda:
        envs_folder + "dnaapler.yaml"
    log: results_per_sample_dir + "{sample}/4_genome_rotation/dnaapler.log"
    params:
        options = "--force "
    shell:
        """
        dnaapler all {params.options} \
        --prefix {wildcards.sample} \
        --output {output.dir} \
        --threads {threads} \
        --ignore {input[1]} \
        --input {input[0]} 2>&1 | tee {log} >/dev/null

        cp {output.dir}/{wildcards.sample}_reoriented.fasta {output.contigs} 
        cp {output.dir}/{wildcards.sample}_all_reorientation_summary.tsv {output.summary}
        """