Snakemake-Profiles / lsf

Snakemake profile for running jobs on an LSF cluster
MIT License
36 stars 22 forks source link

Snakemake write wrong error to output file, jobs fail for no reason #32

Closed openpaul closed 3 years ago

openpaul commented 3 years ago

Hello, thanks for this plug-in first and foremost.

I noticed an Issue with my workflow and could not create a minimal not working example. The logs of the jobs and the error messages do not line up:

Take job 5767 for example:

[Mon Sep 28 14:14:25 2020]
rule mash_sketch:
    input: data/species/genomes/GCA_900290415.1.fna
    output: data/species/sketch/GCA_900290415.1.msh
    jobid: 5767
    wildcards: gca=GCA_900290415.1
    resources: mem_mb=4000

Submitted job 5767 with external jobid '9873015 logs/cluster/mash_sketch/gca=GCA_900290415.1/jobid5767_7192d265-413d-4ae5-8d21-1ac836805741.out'.

The log file "logs/cluster/mash_sketch/gca=GCA_900290415.1/jobid5767_7192d265-413d-4ae5-8d21-1ac836805741.err" is for a different rule "predict proteins":

Building DAG of jobs...
Traceback (most recent call last):
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/__init__.py", line 709, in snakemake
    keepincomplete=keep_incomplete,
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/workflow.py", line 670, in execute
    dag.init()
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/dag.py", line 177, in init
    job = self.update(self.file2jobs(file), file=file, progress=progress)
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/dag.py", line 715, in update
    progress=progress,
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/dag.py", line 792, in update_
    file.inventory()
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/io.py", line 210, in inventory
    self._local_inventory(cache)
  File "[...]/miniconda3/envs/drep_euk/lib/python3.7/site-packages/snakemake/io.py", line 224, in _local_inventory
    with os.scandir(path) as scan:
FileNotFoundError: [Errno 2] No such file or directory: 'data/species/gmes/compute/GCA_014235955.1'

The out log file is correct and shows there was an issue with the job, but not which


------------------------------------------------------------
Sender: LSF System <lsf@hx-noah-11-07>
Subject: Job 9873015: <mash_sketch.gca=GCA_900290415.1> in cluster <EBI> Exited

Job <mash_sketch.gca=GCA_900290415.1> was submitted from host <hx-noah-39-01> by user <$USER> in cluster <EBI> at Mon Sep 28 14:14:25 2020
Job was executed on host(s) <hx-noah-11-07>, in queue <research-rh74>, as user <$USER> in cluster <EBI> at Mon Sep 28 14:14:26 2020
</homes/$USER> was used as the home directory.
</hps/research/$GROUP/$USER/projects/drep/snakemake> was used as the working directory.
Started at Mon Sep 28 14:14:26 2020
Terminated at Mon Sep 28 14:15:25 2020
Results reported at Mon Sep 28 14:15:25 2020

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
/hps/research/$GROUP/$USER/projects/drep/snakemake/.snakemake/tmp.hrw102kd/snakejob.mash_sketch.5767.sh
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   1.63 sec.
    Max Memory :                                 61 MB
    Average Memory :                             58.50 MB
    Total Requested Memory :                     4000.00 MB
    Delta Memory :                               3939.00 MB
    Max Swap :                                   468 MB
    Max Processes :                              4
    Max Threads :                                5
    Run time :                                   59 sec.
    Turnaround time :                            60 sec.

The output (if any) is above this job summary.

I dont know what is getting messed up, but log files (always just the err file) do not line up with jobs, so debugging is complicated and I suspect this messes with snakemake job success/fail evaluations.

openpaul commented 3 years ago

Some jobs actually succeed, others don't, so its quite inconsistent and looks like a weird bug to me.

openpaul commented 3 years ago

I dont know why, but the issue was solved for me by using an absolute path as my output folder.

mbhall88 commented 3 years ago

Hey Paul,

This sounds odd. Seems even stranger that using an absolute path fixes it.

Could you provide some more info for me?

Given this is on noah, maybe you could point me at the directory and I can take a look? (Not sure if you may need to tweak some permissions)

Also, are you on the latest profile commit? (prior to the one I literally just merged - although that one would be useful for you due to some EBI noah zombie funny business)

openpaul commented 3 years ago

I fetched it only recently (17th Sept 2020) so I should have everything but the latest commit. Sure, I slack you the location. Now it works. I also changed that there is no error log created anymore so everything is written to the out log.

I am creating a minimal version to look at and create the log files again (I deleted a bunch of the old ones bc I needed to move on).