tools/flow: various expressions not seen as dependency

apstndb commented 3 years ago

What version of CUE are you using (`cue version`)?

$ cue version
cue version 0.4.0 darwin/amd64

What did you do?

package main

import (
    "encoding/json"
    "tool/file"
    "tool/cli"
)

command: example1: {
    task: find_files: file.Glob & {
        glob: "/etc/passwd"
    }
    _files: [
        if len(task.find_files.files) > 0 {task.find_files.files[0]},
        "/dev/null",
    ]
    task: print_files: cli.Print & {
        text: "print_files: " + json.Marshal(_files)
    }
    task: print_first_file: cli.Print & {
        // only differ
        $before: [task.print_files]
        text: "print_first_file: " + _files[0]
    }
}

command: example2: {
    task: find_files: file.Glob & {
        glob: "/etc/passwd"
    }
    _files: [
        if len(task.find_files.files) > 0 {task.find_files.files[0]},
        "/dev/null",
    ]
    task: print_files: cli.Print & {
        // only differ
        $before: [task.print_first_file]
        text: "print_files: " + json.Marshal(_files)
    }
    task: print_first_file: cli.Print & {
        text: "print_first_file: " + _files[0]
    }
}

What did you expect to see?

The result of print_files should be the first element of the result of print_files and the results of example1 and example2 should be the same except ordering.

$ cue example1
print_files: ["/etc/passwd","/dev/null"]
print_first_file: /etc/passwd
$ cue example2
print_first_file: /etc/passwd
print_files: ["/etc/passwd","/dev/null"]

What did you see instead?

The results are different.

$ cue example1
print_files: ["/etc/passwd","/dev/null"]
print_first_file: /etc/passwd
$ cue example2
print_first_file: /dev/null
print_files: ["/etc/passwd","/dev/null"]

If only evaluate print_first_file, it is also the unexpected result.

package main

import (
    "encoding/json"
    "tool/file"
    "tool/cli"
)

command: example3: {
    task: find_files: file.Glob & {
        glob: "/etc/passwd"
    }
    task: print_first_file: cli.Print & {
        text: "print_first_file: " + [
            if len(task.find_files.files) > 0 {task.find_files.files[0]},
            "/dev/null",
        ][0]
    }
}

$ cue example3
print_first_file: /dev/null

verdverm commented 3 years ago

You are hitting concurrency races from the DAG solver. You need to add $after: [task.find_files] in the print task to specify the dependency. cue does not seem to be able to infer it through the guard

Here is a working example:

package main

import (
        "tool/file"
        "tool/cli"
)

command: example: {
        find_files: file.Glob & {
                glob: "/etc/passwd"
        }
        print_first_file: cli.Print & {
                $after: [find_files]
                text: "print_first_file: " + [
                        if len(find_files.files) > 0 {find_files.files[0]},
                        "/dev/null",
                ][0]
        }
}

verdverm commented 3 years ago

@mpvl should the tooling layer be able to infer potential dependencies across guards?

myitcv commented 3 years ago

Analysis: it actually looks like the problem here specifically is that with the if clause in _files, the reference _files[0] in print_first_file is not seen as a dependency on _files, and hence the dependency on find_files is missed:

exec cue cmd example1
stdout 'print_files: \["hello\.txt","/dev/null"\]'
stdout 'print_first_file: hello\.txt'

-- x_tool.cue --
package main

import (
    "encoding/json"
    "tool/file"
    "tool/cli"
)

command: example1: {
    task: find_files: file.Glob & {
        glob: "hello.txt"
    }
    _files: [
        if len(task.find_files.files) > 0 {task.find_files.files[0]},
        "/dev/null",
    ]
    task: print_files: cli.Print & {
        text: "print_files: " + json.Marshal(_files)
    }
    task: print_first_file: cli.Print & {
        text: "print_first_file: " + _files[0]
    }
}
-- hello.txt --

Drop the if clause and the dependency does get detected:

package main

import (
    "encoding/json"
    "tool/file"
    "tool/cli"
)

command: example1: {
    task: find_files: file.Glob & {
        glob: "hello.txt"
    }
    _files: [
        {task.find_files.files[0]},
        "/dev/null",
    ]
    task: print_files: cli.Print & {
        text: "print_files: " + json.Marshal(_files)
    }
    task: print_first_file: cli.Print & {
        text: "print_first_file: " + _files[0]
    }
}

(but clearly this changes the sense of the example).

verdverm commented 3 years ago

Yea, generally a dependency may only exists under certain conditions with guards and comprehensions.

The questions might be 1) how complicated can this become? and 2) how much of that complexity can/should cue handle before having a user be explicit?

iirc, dependency detection happens up front and then may be extended [1]. What happens when an if is against some data that is read from a file though? There are some helpful comments around [2]

[1] https://github.com/cue-lang/cue/blob/master/tools/flow/tasks.go#L47 [2] https://github.com/cue-lang/cue/blob/master/tools/flow/tasks.go#L206

myitcv commented 3 years ago

The questions might be 1) how complicated can this become? and 2) how much of that complexity can/should cue handle before having a user be explicit?

I'm not clear, at least not with this example, that we are at that point yet. So for now I consider this a straightforward bug.

verdverm commented 3 years ago

This comment seems relevant, I suspect this is more of a todo than a bug based on various comments around the code.

https://github.com/cue-lang/cue/blob/master/internal/core/dep/mixed.go#L26

addreas commented 2 years ago

Might be worth another issue, but related to the DAG solver none the less:

command: example3: {
    task: find_files: file.Glob & { glob: "./*.cue" }

    task: {
        for file in task.find_files.files {
            "echo \(file)": exec.Run & {
                cmd: "echo \(file)"
            }
        }
    }
}

I guess this would require that the entire command: [string]: task object would have to be run through another round of unification after each task finishes, until there are no non-concrete values. It felt natural to create tasks dynamically like this, but I didn't think about if it would be hard to implement.

Not failing silently would probably be the most important improvement, perhaps cue cmd needs a -c flag as well?

eonpatapon commented 2 years ago

tool/flow already support incremental dependency detection. I think the problem is that the comprehension is "masking" the reference between exec.Run tasks and file.Glob task.

By defining a reference in the cli.Print task to the file.Glob task more explicitely, it works properly:

import (
    "tool/file"
    "tool/cli"
)

command: example3: {
    task: find_files: file.Glob & {
        glob: "./*.cue"
    }
    for idx, _ in task.find_files.files {
        "print_\(idx)": cli.Print & {
            text: task.find_files.files[idx]
        }
    }
}

eonpatapon commented 2 years ago

Oops didn't read the whole issue, seems it was already descussed.

myitcv commented 2 years ago

@addreas

Not failing silently would probably be the most important improvement

I think it's probably the case that if we can detect such a situation to fail non-silently we might as well just make it work for dependency analysis 😄

eonpatapon commented 2 years ago

file.Glob defines files: [...string], and because the dependency is not found, I guess it's resolved immediately to [] in the comprehension and no tasks are run, hence no error is reported.

If we define the task like this we can see an error:

import (
    "tool/file"
    "tool/cli"
)

command: example3: {
    task: find_files: file.Glob & {
        glob: "./*.cue"
        files: [string, ...string]
    }

    for f in task.find_files.files {
        "print_\(f)": cli.Print & {
            text: f
        }
    }
}

 cue example3
command.example3: invalid interpolation: non-concrete value string (type string):
    ./test_tool.cue:15:3

addreas commented 2 years ago

I didn't even consider the possibility of trying to force an error out of it, will definitely have to try to remember that somehow.

myitcv commented 2 years ago

@eonpatapon yes that's another interesting case. Until task.find_files actually runs, there is only one task defined, namely the glob. When that completes, the comprehension reevaluates to further tasks (the original evaluation being to no further tasks, because find_files.files is empty list as you point out).

So whilst it is true to say that the workflow package does not "see" the dependency from the print tasks to the glob (highlighted by the error you've highlighted), the example minus your change "works" because the former only exists as a result of the latter having run already. But it's clearly a fragile situation and one that we should fix up as part of the other issues described here.

mpvl commented 2 years ago

@verdverm

@mpvl should the tooling layer be able to infer potential dependencies across guards?

yes

mpvl commented 2 years ago

The problem is actually quite subtle. It is indeed related to the if, but not how people expect, probably.

The way dependency analysis works is that a dependency is considered "blocking" if the value it points to is incomplete. The flow package uses "dynamic dependencies", meaning it tracks values dependency after repeated evaluation. This allows it to detect dynamically generated tasks.

However, because of this strategy it detects the dependency on _files[0] as non-blocking. After all, during dynamic evaluation the first value is present (namely the second syntactic entry), so the dependency can be satisfied.

So this is a case of working as intended, but not as desired.

I verified that, indeed, dependencies are tracked across ifs otherwise.

So for now this will require an explicit $after.

I'm very happy to hear about (well-worked-out) proposals on how dependency resolution could be improved. For instance, I could imagine using a conjunction of both dynamic and static dependencies. Note that the internal dep package implements three different dependency algorithms. The flow package only uses one of those.

verdverm commented 2 years ago

There are some other use cases that I'm hearing which overlap with dependency resolution

nested tasks:

print: & cli.Print {
     text: (file.Read & { filename: "foo.txt" }).contents
}

(2) fields generated in code Go or CUE(?), so referring to an unknown field.

I believe there was another issue which spoke about adding an Out field in their custom tasks.

Separately, there would be interest in exposing the internal dep package in the Go API. Thoughts on this @mpvl ?

mpvl commented 2 years ago

@verdverm: number 1 is interesting. Currently flow does not support tasks nested in other tasks. This was deliberate as I wasn't yet fully clear on what the semantics of that would be. I probably would not have considered to cover this case, for instance. But that is something that could definitely be supported.

Regarding 2, can you give an example where that doesn't work? Not sure how flow handles incomplete errors, but I guess that could be supported. Seems a bit brittle, though.

mpvl commented 2 years ago

@verdverm regarding making dep public. That is ultimately the plan. I'm mostly still figuring out what is a good API, though. For instance, there are currently three different ways of computing dependencies. Making an API public makes it hard to move and make changes and it currently isn't ready.

verdverm commented 2 years ago

Opened #1535 as a feature request for another dimension where task dependencies can exist through references and across package boundaries.

cue-lang / cue