Factual / drake

Data workflow tool, like a "Make for data"
Other
1.48k stars 110 forks source link

transitive timecheck #152

Open potash opened 10 years ago

potash commented 10 years ago

Suppose I have the following Drakefile:

%tag <- input
    echo 1

output <- %tag
    echo 2

When I modify input and run drake, I get "Nothing to do.". I would like Drake see that there is a path in the tree with newer input than output. Is there a reason why it wouldn't do that? This is related to my other issue #151 because oftentimes I want to do several steps in a database (technically no-output) and then dump to a file.

dirtyvagabond commented 10 years ago

We officially recommend to avoid using tags as outputs when you can. Per the spec:

While this could serve as a good transitionary vehicle from a linear workflow, using output files is a highly preferred way to establish step dependencies. Using tags makes the workflow less flexible and more error prone, hard­codes file locations into commands, and skips a variety of features Drake provides (base directory, automatic step selection, data backups and reverts, and others).

If I'm reading you right, your first step writes to a database and you don't technically need to track an output file for that so you're putting a tag there instead. In this case, one simple workaround would be to have that first step, as a final action, create an output file anyway. Perhaps put some logging info in there so it's not totally useless. Then you get the standard benefits of Drake's timestamp checking.

potash commented 10 years ago

I see. It occured to me to use a dummy file to represent the last time the command was run but that will be problematic in a collaborative environment if that file doesn't get synced (but the database and raw data files are shared).