evoldoers / biomake

GNU-Make-like utility for managing builds and complex workflows
BSD 3-Clause "New" or "Revised" License
103 stars 9 forks source link

biomake -Q slurm prints an error when the subdirectory does not exist #53

Open sjackman opened 6 years ago

sjackman commented 6 years ago

Even though it prints this noisy error message, it seems to complete successfully. It looks like may be failing to create the file foo/.biomake/slurm/job/bar.

foo/bar:
    mkdir -p $(@D)
    touch $@
$ biomake -Q slurm foo/bar
Target foo/bar not materialized - build required
Exception: error(existence_error(directory,/home/sjackman/tmp/foo/.biomake),context(system:make_directory/1,No such file or directory))
  [25] backtrace(99) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/library/prolog_stack.pl:444
  [24] prolog_exception_hook('<garbage_collected>',_528,539,501) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:107
  [23] <meta call>
  [22] make_directory("/home/sjackman/tmp/foo/.biomake") <foreign>
  [21] catch(utils:make_directory("/home/sjackman/tmp/foo/.biomake"),error(existence_error(directory,"/home/sjackman/tmp/foo/.biomake"),context(...,'No such file or directory')),utils:fail) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/init.pl:371
  [20] utils:safe_make_directory("/home/sjackman/tmp/foo/.biomake") at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:314
  [19] utils:biomake_make_subdir_list('/home/sjackman/tmp/foo','<garbage_collected>') at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:306
  [18] utils:biomake_private_filename_dir_exists('foo/bar',[slurm,"job"],_894) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/utils.pl:300
  [17] queue:run_execs_with_qsub(slurm,rb('foo/bar',[],true,["mkdir -p foo"|...],v(_1012,'foo/bar',[],...)),[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/queue.pl:59
  [15] biomake:dispatch_run_execs('<garbage_collected>',[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:468
  [14] biomake:run_execs_and_update('<garbage_collected>',[],'<garbage_collected>') at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:451
  [13] biomake:build('foo/bar',[],[queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/biomake.pl:164
  [11] '$apply':forall('<garbage_collected>',user:build(_1344,...)) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/apply.pl:51
  [10] build_toplevel([queue(slurm),...|...]) at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/cli.pl:41
   [9] main at /home/sjackman/.linuxbrew/Cellar/biomake/HEAD-5c63168/prolog/biomake/cli.pl:17
   [8] '<meta-call>'('<garbage_collected>') <foreign>
   [7] catch(user:(...,...),_1568,'$toplevel':true) at /home/sjackman/.linuxbrew/Cellar/swi-prolog/7.6.4/libexec/lib/swipl-7.6.4/boot/init.pl:371

Note: some frames are missing due to last-call optimization.
Re-run your program in debug mode (:- debug.) to get more detail.
Submitting job: sbatch -o /home/sjackman/tmp/foo/.biomake/slurm/out/bar -e /home/sjackman/tmp/foo/.biomake/slurm/err/bar     --parsable /home/sjackman/tmp/foo/.biomake/slurm/script/bar >/home/sjackman/tmp/foo/.biomake/slurm/job/bar
foo/bar queued for rebuild
sjackman commented 6 years ago

If biomake creates that foo/.biomake directory, the Makefile needs to be structured so that it doesn't fail if the directory already exists. For example, removing the -p from mkdir would fail.

foo/bar:
    mkdir $(@D)
    touch $@
❯❯❯ biomake -Q slurm foo/bar
❯❯❯ cat foo/.biomake/slurm/err/bar
mkdir: cannot create directory ‘foo’: File exists
ihh commented 6 years ago

yes, biomake currently leverages the filesystem far too intimately; it assumes that the subdirectory is there, to stash its internal files in the hidden directory .biomake

This is certainly ugly but, as you point out, you can work around it easily enough by structuring the Makefile to ensure the directory exists, e.g. by adding an extra rule converting your Makefile to something like

all: foo/bar

%/bar: %
    echo hello there >$@

foo:
    mkdir -p $@

If that doesn't work for you, I get it, but it seems to me that the only other option is to have biomake anticipate that mkdir -p somehow, which seems wrong to me.

Of course, biomake could also find some way of storing its internal queue-related info that doesn't rely on the directory hierarchy being in place. I accept that this is a bug due to a design flaw (over-reliance on filesystem)...

Open to additional thoughts on this.... incidentally, if bug reports can be reproduced using the internal thread pool queue implementation (-Q poolq) as opposed to things like slurm or sge that require those engines to be present (or simulated) for debugging, that'd be awesome ;)

ihh commented 6 years ago

hmmm, now i am wondering if that example I gave would work... actually it might still run into the same issue. going to think about this some more...

ihh commented 6 years ago

Perhaps the best thing is just to do the mkdir -p implicitly after all.... hmmmmm..... any thoughts @cmungall @sjackman ?

sjackman commented 6 years ago

I'm okay with doing the mkdir -p implicitly. Note that will break the example that I give in https://github.com/evoldoers/biomake/issues/53#issuecomment-358504550 due to mkdir: cannot create directory ‘foo’: File exists.

I do in fact like the design of biomake and how it leverages the file system. It does lead to this issue though.

I avoid whenever possible having a rule depend on a directory, because the directories time stamp is pretty unstable. Any time a file within that directory changes, the time stamp of the directory changes. A simple workaround is to the time stamp issue follows. I think it suffers from the same issue though, of foo/ not existing when Biomake creates its metafile.

all: foo/bar

%/bar: %/stamp
    echo hello there >$@

foo/stamp:
    mkdir -p $(@D)
    touch $@
sjackman commented 6 years ago

I'm troubleshooting a program that appears to misbehave when it's output directory already exists, before it starts running.

Here's a suggestion. The current Biomake metadata directory is

./$(@D)/.biomake/slurm/xxx/$(@F)

How about instead…

./.biomake/slurm/xxx/$(@D)/$(@F)

where xxx is err job out script. It would resolve this issue, and it has the nice side effect of the directory structure of ./.biomake/slurm/out exactly mirroring the real directory structure of output files.

ihh commented 6 years ago

@sjackman The problem with that is that the biomake directory for target X would then depend on which directory biomake was invoked in, as well as the location of X.

In general this seems a bit fragile to me, and it would clearly break some cases (e.g. caching the MD5 checksum of a file)

sjackman commented 6 years ago

Is your concern specifically with recursive Makefile? That is, a Makefile in each subdirectory? I usually use one sole Makefile in the top-level. I don't think your concern affects that situation.

ihh commented 6 years ago

There are a few situations I can imagine where having the target metadata associated with the Makefile location (or the current working directory), rather than the target location, would be a bad idea.

One such situation would be including Makefiles from other Makefiles. Another would be the case (common-ish for me) where a Makefile is refactored/split into one or more other Makefiles. Or indeed any situation where the Makefile is moved.

Another gotcha situation would be if the Makefile uses absolute paths and the user expects to be able to invoke it from other, arbitrary directories.

Make is already fragile enough in such situations - c.f. hacks involving the $(MAKE), $(MAKEFILE_LIST) and $(MAKE_ARGS) variables. I'm not convinced that it's an improvement to go from a requirement that subdirectories be present (which at least is a transparent problem, barfy error message notwithstanding) to a requirement that biomake always be invoked from the same directory.

sjackman commented 6 years ago

I'm suggesting that the .biomake directory be in the current working directory, not relative to the location of the Makefile, which I agree would be a bad idea, so the location of the Makefile shouldn't matter. Absolute paths would be trouble.

I'm okay with closing this issue as a known limitation if that's your preference. It would be nice to avoid the ugly error message though if the directory doesn't already exist. Biomake should probably create the directory in that case I'd say.

ihh commented 6 years ago

I'm still kind of hoping that an elegant solution will present itself... e.g. if @cmungall jumps in...

I can certainly mitigate it by

It may be a few days before i have another chance to do some biomake hacking, so let's leave it open for discussion until then at least

ihh commented 6 years ago

Also, if we allow for switchable behavior, there are other options... for example the location of metadata directories (in CWD vs in target dir) could be a user-changeable option... seems messy but perhaps the only way to cover all cases...

sjackman commented 6 years ago

adding a command-line option (or special target) that directs biomake to create intermediate directories as needed

Oddly, I believe it's already doing this, making the directory. Even though it prints the error message above, it still continues happily on and creates all the necessary directories and files.

sjackman commented 6 years ago

Also, if we allow for switchable behavior, there are other options... for example the location of metadata directories (in CWD vs in target dir) could be a user-changeable option... seems messy but perhaps the only way to cover all cases...

For my case without recursion, I'd find that helpful, .biomake in CWD. I have a workaround in the mean time. I'll think on it too.