Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

Use workflow file name in log directory names #30

Closed bricoletc closed 3 years ago

bricoletc commented 3 years ago

Hi,

I find myself running several workflows with the same rule name, leading to the log directory with that rule name holding log files for multiple workflows.

Currently I just rename rules to avoid that, but what do you think of prefixing the log dir name with the workflow file name?

mbhall88 commented 3 years ago

So you have multiple workflows outputting logs to the same directory? Something like

logs
    cluster
        rule_41.log
        rule_56.log
workflows
    a
        Snakefile
    b
        Snakefile

And you're telling both workflows a and b to write cluster logs to ../../logs/

bricoletc commented 3 years ago

Yes i have that kind of layout and my profile is set to put all logs in say logs/cluster

Then if I have a rule called map in both workflows the logs will both go to directory logs/cluster/map and i'm proposing maybe doing logs/cluster/a_map and logs/cluster/b_map

mbhall88 commented 3 years ago

Hmm, pooling log files in that way does not seem like a good idea to me. For the exact reason you mention - if you have the same rule name across workflows there will no doubt be confusion.
I don't think adding a workflow prefix is ideal either as the log paths are already more bloated than I would like...

bricoletc commented 3 years ago

Sure- i may be missing something, what's a solution to not pool log files other than having the workflows in different directories?

leoisl commented 3 years ago

I would suggest to use the workflow name to create a log dir to put all logs of that specific workflow there. For example, instead of having

logs/cluster/a_map and logs/cluster/b_map

we have:

logs/cluster/a/map and logs/cluster/b/map

This way, the log filename at least remains the same and is not more bloated (although the path will get more bloated, hard to avoid this anyway). This can be easy to do if the workflow name is present in the job properties, but I can't check this now...

mbhall88 commented 3 years ago

Sure- i may be missing something, what's a solution to not pool log files other than having the workflows in different directories?

I create a log/cluster dir in each workflow directory. i.e. workflowA/log/cluster, workflowB/log/cluster

Because those logs relate only to those workflows, it makes sense to keep them within the scope of that workflow.

leoisl commented 3 years ago

oh sorry, I misunderstood this, so my previous comment does not make much sense. I thought Brice had subworkflows, and when he ran a workflow with subworkflows, everything was being sent to a same logs dir

bricoletc commented 3 years ago

Ah i see. I think the problem is that I'm running workflows from a top-level directory: following our example, from the root of

logs
    cluster
        rule_41.log
        rule_56.log
workflows
    a
        Snakefile
    b
        Snakefile

The reason being workflows tap into some common config files (in a config/ dir) and scripts (in a scripts/ dir). It's a bit like Leandro's idea of subworkflows i think. Then cannot have a log dir per workflow, right?

mbhall88 commented 3 years ago

Your workflow subdirectories make total sense. But I still don't think it's a good idea to pool log files from multiple workflows. Feel free to do it of course, but I'm not keen to try and add support for conflicting/ambiguous rule names in log file names.

bricoletc commented 3 years ago

Ok, this clears it up in my head, this can be closed for me :+1: