cooperative-computing-lab / cctools

The Cooperative Computing Tools (cctools) enable large scale distributed computations to harness hundreds to thousands of machines from clusters, clouds, and grids.
http://ccl.cse.nd.edu
Other
134 stars 119 forks source link

wq python specify directory is broken #307

Closed btovar closed 10 years ago

btovar commented 10 years ago

self.specify_directory just calls itself. We may want to add specify_input_directory, specify_output_directory as for specifying files.

btovar commented 10 years ago

@dpandiar

btovar commented 10 years ago

Dinesh, I tried to fix this one, but I get a segfault when calling work_queue_task_specify_directory. Passing it to you.

dpandiar commented 10 years ago

The code for specify_directory() looks correct to me. What specifically is broken?

dpandiar commented 10 years ago

Ah..I see now..

dpandiar commented 10 years ago

Okay, the wq_task_specify_directory() is itself broken not just in Python. It allows the local name to be empty. And its documentation is contradictory saying the local name is optional and but that it fails if local name is null.

Is there a case where the local name can be null? I can't see one.

malbrec2 commented 10 years ago

If you're non-recursively specifying a directory as an input, having a local name should be unnecessary, since all of the necessary information is already present (namely the directory's name on the remote host).

btovar commented 10 years ago

I suggest to not allow the local name to be null, to be consistent with specify_file.

malbrec2 commented 10 years ago

If the user's intent is to create a directory at the remote site that doesn't map to any particular directory at the local site, I think they should be able to express that. Requiring a local name to be specified when when not necessary just seems to invite bugs into the user's code. You'll inevitably get someone who chooses "." or "/tmp" or "/dev/null" as the local directory name, to indicate that they don't have a particular directory they want to copy, then later they forget or someone else doesn't intuit what the call was intending and adds a recursive flag (perhaps to maintain consistency within their code).

I think it should absolutely accept a local name if the user wants to provide one, to maintain consistency with specify_file. I just don't think it should require one when we don't need that information.

btovar commented 10 years ago

Currently we only officially support objects being input or output. This means that when using wq api, we need a local mapping to either copy from/to. Creating a directory at the worker that is not meant to be copied back to the master should not be advised in the current model.

Later, we may want to add 'intermediate files' that only live in the worker (see Lee-Ping's milestone).

malbrec2 commented 10 years ago

If you're not going to copy the contents of a directory (which was the reason specify_directory was implemented), and you can rename the directory on the remote site, what exactly is being copied?

Also, we already support creating files on the worker that don't exist on the master via specify_buffer . On Nov 13, 2013 8:52 AM, "Benjamin Tovar" notifications@github.com wrote:

Currently we only officially support objects being input or output. This means that when using wq api, we need a local mapping to either copy from/to. Creating a directory at the worker that is not meant to be copied back to the master should not be advised in the current model.

Later, we may want to add 'intermediate files' that only live in the worker (see Lee-Ping's milestone).

— Reply to this email directly or view it on GitHubhttps://github.com/cooperative-computing-lab/cctools/issues/307#issuecomment-28400213 .

btovar commented 10 years ago

If nothing is copied, but created at the worker, why is the directory marked as an input? Isn't it an output?

With specify_buffer we do have the local object to copy from.

malbrec2 commented 10 years ago

Its an input because it is needed at the remote site for the task to successfully run.

For instance, if your command is "./mysim.exe >> foo/bar" and "foo/" doesn't exist at the worker, the command will fail.

btovar commented 10 years ago

I see. In your comments above, what happens when the local name is not specified, and the recursive flag is set?

On Wed, Nov 13, 2013 at 10:44 AM, Michael Albrecht <notifications@github.com

wrote:

Its an input because it is needed at the remote site for the task to successfully run.

For instance, if your command is "./mysim.exe >> foo/bar" and "foo/" doesn't exist at the worker, the command will fail.

— Reply to this email directly or view it on GitHubhttps://github.com/cooperative-computing-lab/cctools/issues/307#issuecomment-28405209 .

malbrec2 commented 10 years ago

The call should fail, and an expressive error message should be displayed. eg: "Cannot recursively create "foo/" on the remote server, no local directory specified"

dpandiar commented 10 years ago

After hearing this discussion, here is what I see as an answer to my question.

Our task_specify_file/url/buffer API allow data dependencies to be expressed and enable us to manage those dependencies at the worker (caching/garbage collection) and migrate failed tasks.

The specify_directory() API allowing local names to be optional is quite different from the above since it only aids in setting up the environment for execution.

In other words, I don't see WQ offering significant benefits to the user when using this API with null local names against something like: mkdir foo; ./mysim.exe >> foo/bar (the one advantage I see is the user doesn't have to do this for every task when using specify_directory)

So, I propose keeping the specify_directory() but changing it to always transfer the entire directory recursively.

If that sounds correct and agreeable to everyone, I will work on resolving this bug.

btovar commented 10 years ago

Dinesh, For specify_directory this sounds good to me.

However, should we think about adding api functions which job is to set the worker's environment? Like create directory hierarchies and set environment variables?

malbrec2 commented 10 years ago

@dpandiar If you do that then there's not much purpose to keeping specify_directory as a distinct api call, since specify_file already has that behavior when pointed at a directory.

dpandiar commented 10 years ago

@malbrec2 Good point. And I now see the main purpose behind specify_directory(). Moving forward, I like the idea of @btovar to rename specify_directory as specify_directory_hierarchy() that does not take a local name. Its sole purpose is to setup the remote directory structure for task execution.

@btovar, I like the idea of also having a specify_env_variables(). Can you please open bug reports so we have a record of them?

dpandiar commented 10 years ago

@btovar, I am going to leave the work on specify_directory() for the next release (#310). For 4.0.3, I will have a workaround for the segfault. How is that?

btovar commented 10 years ago

Dinesh, That sounds great, thanks.

dpandiar commented 10 years ago

Ben, I pushed the commit (853e209e3) for this directly to mainline since it is just a temporary fix and it passed my simple tests.

dpandiar commented 10 years ago

Ben, it looks like commit 853e209 addressed the segfault. We can close this report and think about the specify_directory_hierarchy in #310?

btovar commented 10 years ago

Sounds good.