Closed btovar closed 10 years ago
@dpandiar
Dinesh, I tried to fix this one, but I get a segfault when calling work_queue_task_specify_directory. Passing it to you.
The code for specify_directory() looks correct to me. What specifically is broken?
Ah..I see now..
Okay, the wq_task_specify_directory() is itself broken not just in Python. It allows the local name to be empty. And its documentation is contradictory saying the local name is optional and but that it fails if local name is null.
Is there a case where the local name can be null? I can't see one.
If you're non-recursively specifying a directory as an input, having a local name should be unnecessary, since all of the necessary information is already present (namely the directory's name on the remote host).
I suggest to not allow the local name to be null, to be consistent with specify_file.
If the user's intent is to create a directory at the remote site that doesn't map to any particular directory at the local site, I think they should be able to express that. Requiring a local name to be specified when when not necessary just seems to invite bugs into the user's code. You'll inevitably get someone who chooses "." or "/tmp" or "/dev/null" as the local directory name, to indicate that they don't have a particular directory they want to copy, then later they forget or someone else doesn't intuit what the call was intending and adds a recursive flag (perhaps to maintain consistency within their code).
I think it should absolutely accept a local name if the user wants to provide one, to maintain consistency with specify_file. I just don't think it should require one when we don't need that information.
Currently we only officially support objects being input or output. This means that when using wq api, we need a local mapping to either copy from/to. Creating a directory at the worker that is not meant to be copied back to the master should not be advised in the current model.
Later, we may want to add 'intermediate files' that only live in the worker (see Lee-Ping's milestone).
If you're not going to copy the contents of a directory (which was the reason specify_directory was implemented), and you can rename the directory on the remote site, what exactly is being copied?
Also, we already support creating files on the worker that don't exist on the master via specify_buffer . On Nov 13, 2013 8:52 AM, "Benjamin Tovar" notifications@github.com wrote:
Currently we only officially support objects being input or output. This means that when using wq api, we need a local mapping to either copy from/to. Creating a directory at the worker that is not meant to be copied back to the master should not be advised in the current model.
Later, we may want to add 'intermediate files' that only live in the worker (see Lee-Ping's milestone).
— Reply to this email directly or view it on GitHubhttps://github.com/cooperative-computing-lab/cctools/issues/307#issuecomment-28400213 .
If nothing is copied, but created at the worker, why is the directory marked as an input? Isn't it an output?
With specify_buffer we do have the local object to copy from.
Its an input because it is needed at the remote site for the task to successfully run.
For instance, if your command is "./mysim.exe >> foo/bar" and "foo/" doesn't exist at the worker, the command will fail.
I see. In your comments above, what happens when the local name is not specified, and the recursive flag is set?
On Wed, Nov 13, 2013 at 10:44 AM, Michael Albrecht <notifications@github.com
wrote:
Its an input because it is needed at the remote site for the task to successfully run.
For instance, if your command is "./mysim.exe >> foo/bar" and "foo/" doesn't exist at the worker, the command will fail.
— Reply to this email directly or view it on GitHubhttps://github.com/cooperative-computing-lab/cctools/issues/307#issuecomment-28405209 .
The call should fail, and an expressive error message should be displayed. eg: "Cannot recursively create "foo/" on the remote server, no local directory specified"
After hearing this discussion, here is what I see as an answer to my question.
Our task_specify_file/url/buffer API allow data dependencies to be expressed and enable us to manage those dependencies at the worker (caching/garbage collection) and migrate failed tasks.
The specify_directory() API allowing local names to be optional is quite different from the above since it only aids in setting up the environment for execution.
In other words, I don't see WQ offering significant benefits to the user when using this API with null local names against something like:
mkdir foo; ./mysim.exe >> foo/bar
(the one advantage I see is the user doesn't have to do this for every task when using specify_directory)
So, I propose keeping the specify_directory() but changing it to always transfer the entire directory recursively.
If that sounds correct and agreeable to everyone, I will work on resolving this bug.
Dinesh, For specify_directory this sounds good to me.
However, should we think about adding api functions which job is to set the worker's environment? Like create directory hierarchies and set environment variables?
@dpandiar If you do that then there's not much purpose to keeping specify_directory as a distinct api call, since specify_file already has that behavior when pointed at a directory.
@malbrec2 Good point. And I now see the main purpose behind specify_directory(). Moving forward, I like the idea of @btovar to rename specify_directory as specify_directory_hierarchy() that does not take a local name. Its sole purpose is to setup the remote directory structure for task execution.
@btovar, I like the idea of also having a specify_env_variables(). Can you please open bug reports so we have a record of them?
@btovar, I am going to leave the work on specify_directory() for the next release (#310). For 4.0.3, I will have a workaround for the segfault. How is that?
Dinesh, That sounds great, thanks.
Ben, I pushed the commit (853e209e3) for this directly to mainline since it is just a temporary fix and it passed my simple tests.
Ben, it looks like commit 853e209 addressed the segfault. We can close this report and think about the specify_directory_hierarchy in #310?
Sounds good.
self.specify_directory just calls itself. We may want to add specify_input_directory, specify_output_directory as for specifying files.