biocore / empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
BSD 3-Clause "New" or "Revised" License
48 stars 31 forks source link

Option to overwrite existing directory #558

Open gibsramen opened 2 years ago

gibsramen commented 2 years ago

I'd like to use Empress in a Snakemake pipeline (see below) but I'm running into an annoying issue. If my understanding is correct (which it could very well not be), the way Snakemake works is that it sees that the results/empress directory does not exist and thus creates it before the shell command executes. However, when this occurs Empress complains that Output directory already exists!. I think what would be useful is for some sort of --overwrite option to force the creation of the visualization + support files.

In my case, I can work around this by changing some of the filepaths but I think this might be a useful option.

rule empress:
    input:
        tree="results/diversity/metagenomics/phylo_rpca/labeled-phylogeny.nwk",
        feature_md="results/tmp/feature_metadata.tmp.tsv"
    output:
        "results/empress/empress.html"
    shell:
        """
        empress tree-plot \
            --tree {input.tree} \
            --feature-metadata {input.feature_md} \
            --output-dir results/empress
        """
fedarko commented 2 years ago

Eesh, that's annoying! It should be feasible to add this in. Looks like the lines causing the problem are here:

https://github.com/biocore/empress/blob/e543610c5654399aa1dcbbede09b45b45c4170d0/empress/_plot_utils.py#L86-L101

... So we can probably just add an --overwrite / --no-overwrite flag (default --no-overwrite, i.e. the current behavior) to the standalone script, and then pass the value of this flag directly to check_and_process_files() to tell it to ignore the output directory already existing.

(Maybe we should add some code to check_and_process_files() to, when overwrite is True, just delete the contents of the output directory? This would limit potential complications with stuff already existing in the output directory, although it might be dangerous if people use this willy-nilly. Not sure what the best practice is.)

gibsramen commented 2 years ago

I think deleting the contents of the output directory might be too dangerous (who knows what other stuff people could have in there). If we could only overwrite empress.html and support_files/* I think that would be sufficient.

Also I wrote this code so it's my own fault this is an annoying issue 😅

ElDeveloper commented 2 years ago

Overwriting is definitely the way to go. This used to be the behavior of make_emperor.py (the predecessor to q2-emperor) and this approached worked great for years.

From: Gibs @.> Date: Saturday, May 21, 2022 at 6:46 AM To: biocore/empress @.> Cc: Subscribed @.***> Subject: Re: [biocore/empress] Option to overwrite existing directory (Issue #558)

I think deleting the contents of the output directory might be too dangerous (who knows what other stuff people could have in there). If we could only overwrite empress.html and support_files/* I think that would be sufficient.

Also I wrote this code so it's my own fault this is an annoying issue 😅

— Reply to this email directly, view it on GitHubhttps://github.com/biocore/empress/issues/558#issuecomment-1133637379, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAC3UC2SO3SINSZ6CV7JOW3VLDSM5ANCNFSM5WP54DJQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

gibsramen commented 2 years ago

So this specific instance can be circumvented with

output:
    directory("results/empress")

but I think it's still worth adding the --overwrite option. Will try to carve out some time to finish up my PR.