Open cueckoo opened 3 years ago
Original reply by @myitcv in https://github.com/cuelang/cue/issues/803#issuecomment-790322233
Thanks for the repro, @verdverm.
Unfortunately like #808 the reproducer is incomplete (indeed the txtar archive is invalid). If I could gently reiterate the suggestions in that issue to test out the repros as you're creating the issue - it means we can jump straight to looking at the problem, and don't have to second guess what was intended.
(In this case it's probably not worth causing the program to fail in the error case - the fact that it takes a noticeable length of time to complete is sufficient).
Here is my version of what I believe your repro was trying to show, which I think demonstrates the slowdown you're experiencing:
exec go mod tidy
exec go run main.go
-- go.mod --
module blah.com
go 1.16
require cuelang.org/go v0.3.0-beta.5
-- main.go --
package main
import (
"fmt"
"cuelang.org/go/cue"
"cuelang.org/go/cue/load"
)
func main() {
// We need a Cue.Runtime, the zero value is ready to use
var RT cue.Runtime
// The entrypoints are the same as the files you'd specify at the command line
entrypoints := []string{"repro.cue"}
// Load Cue files into Cue build.Instances slice
// the second arg is a configuration object, we'll see this later
bis := load.Instances(entrypoints, nil)
var (
I *cue.Instance
V cue.Value
err error
)
// Loop over the instances, checking for errors and printing
for _, bi := range bis {
// check for errors on the instance
// these are typically parsing errors
if bi.Err != nil {
fmt.Println("Error during load:", bi.Err)
continue
}
// Use cue.Runtime to build.Instance to cue.INstance
I, err = RT.Build(bi)
if err != nil {
fmt.Println("Error during build:", bi.Err)
continue
}
// get the root value and print it
V = I.Value()
// fmt.Println("root value:", V)
// Validate the value
err = V.Validate()
if err != nil {
fmt.Println("Error during validate:", err)
continue
}
}
empty, err := RT.Compile("", "")
if err != nil {
panic(err)
}
// fmt.Println("empty:", empty.Value())
base := I.Lookup("base")
// fmt.Println("base:", base)
input := I.Lookup("input")
// fmt.Println("input:", input)
output := I.Lookup("output")
// fmt.Println("output:", output)
// fmt.Println("===============")
merged := empty
fmt.Println("merge.base")
merged, err = merged.Fill(base)
if err != nil {
panic(err)
}
fmt.Println("merge.input")
merged, err = merged.Fill(input)
if err != nil {
panic(err)
}
fmt.Println("merge.output")
merged, err = merged.Fill(output)
if err != nil {
panic(err)
}
fmt.Println("merge.value")
final := merged.Value()
fmt.Println("merge.final")
fmt.Println(final)
}
-- repro.cue --
package repro
base: {
repository: #Dir
build: #Build & {
source: repository
packages: "./cmd"
output: "/usr/local/bin/cmd"
}
help: {
steps: [#Load & {
from: build
}, #Exec & {
args: ["cmd", "-h"]
}]
}
}
input: {
repository: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
}
output: {
help: {
steps: [#Load & {
from: build
}, #Exec & {
args: ["cmd", "-h"]
}]
}
build: {
version: *"1.16" | string
source: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
// Packages to build
packages: "./cmd"
// Specify the targeted binary name
output: "/usr/local/bin/cmd"
env: [string]: string
steps: [#Copy & {
from: #Go & {
version: version
"source": source
"env": env
}
src: output
dest: output
}]
}
repository: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
}
#Dir: steps: [...#Op]
// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load
#Local: {
// do: "local"
dir: string
include: [...string] | *[]
}
#Load: {
// do: "load"
from: _
}
#Exec: {
// do: "exec"
args: [...string]
env?: [string]: string
always?: true | *false
dir: string | *"/"
mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}
#Fetch: {
// do: "fetch-container"
ref: string
}
#Copy: {
// do: "copy"
from: _
src: string | *"/"
dest: string | *"/"
}
#Go: {
// Go version to use
version: *"1.16" | string
// Arguments to the Go binary
args: [...string]
// Source Directory to build
source: #Dir
// Environment variables
env: [string]: string
steps: [
#Fetch & {
ref: "docker.io/golang:\(version)-alpine"
},
#Exec & {
"args": ["go"] + args
"env": env
env: CGO_ENABLED: "0"
dir: "/src"
mount: "/src": from: source
mount: "/root/.cache": "cache"
},
]
}
#Build: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to build
packages: *"." | string
// Specify the targeted binary name
output: string
env: [string]: string
steps: [
#Copy & {
from: #Go & {
"version": version
"source": source
"env": env
args: ["build"]
}
src: output
dest: output
},
]
}
#Test: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to test
packages: *"." | string
#Go & {
"version": version
"source": source
args: ["test", "-v", packages]
}
}
Will work with @mpvl on investigating/reducing this today.
Original reply by @myitcv in https://github.com/cuelang/cue/issues/803#issuecomment-790453494
Analysis: this doesn't appear to be specific to the cue
API.
cue eval repro.cue has no issue finishing in 0.1s
With repro.cue
file stands you're not comparing apples with apples. In the Go code you're effectively unifying {} & base & input & output
. If you create a top-level field with that value then the issue can be reproduced with cue eval
:
exec cue eval
-- repro.cue --
package repro
x: {} & base & input & output
base: {
repository: #Dir
build: #Build & {
source: repository
packages: "./cmd"
output: "/usr/local/bin/cmd"
}
help: {
steps: [#Load & {
from: build
}, #Exec & {
args: ["cmd", "-h"]
}]
}
}
input: {
repository: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
}
output: {
help: {
steps: [#Load & {
from: build
}, #Exec & {
args: ["cmd", "-h"]
}]
}
build: {
version: *"1.16" | string
source: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
// Packages to build
packages: "./cmd"
// Specify the targeted binary name
output: "/usr/local/bin/cmd"
env: [string]: string
steps: [#Copy & {
from: #Go & {
version: version
"source": source
"env": env
}
src: output
dest: output
}]
}
repository: {
steps: [{
// do: "local"
dir: "."
include: []
}]
}
}
#Dir: steps: [...#Op]
// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load
#Local: {
do: "local"
dir: string
include: [...string] | *[]
}
#Load: {
do: "load"
from: _
}
#Exec: {
do: "exec"
args: [...string]
env?: [string]: string
always?: true | *false
dir: string | *"/"
mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}
#Fetch: {
do: "fetch-container"
ref: string
}
#Copy: {
do: "copy"
from: _
src: string | *"/"
dest: string | *"/"
}
#Go: {
// Go version to use
version: *"1.16" | string
// Arguments to the Go binary
args: [...string]
// Source Directory to build
source: #Dir
// Environment variables
env: [string]: string
steps: [
#Fetch & {
ref: "docker.io/golang:\(version)-alpine"
},
#Exec & {
"args": ["go"] + args
"env": env
env: CGO_ENABLED: "0"
dir: "/src"
mount: "/src": from: source
mount: "/root/.cache": "cache"
},
]
}
#Build: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to build
packages: *"." | string
// Specify the targeted binary name
output: string
env: [string]: string
steps: [
#Copy & {
from: #Go & {
"version": version
"source": source
"env": env
args: ["build"]
}
src: output
dest: output
},
]
}
#Test: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to test
packages: *"." | string
#Go & {
"version": version
"source": source
args: ["test", "-v", packages]
}
}
Now working to reproduce this example.
Original reply by @myitcv in https://github.com/cuelang/cue/issues/803#issuecomment-790464007
Further analysis.
In the following repro, I have created a top-level x
field, that unifies base
and output
:
# Base version
exec cue eval
-- repro.cue --
package repro
x: base & output
base: {
repository: #Dir
build: #Build & {
source: repository
packages: "./cmd"
output: "/usr/local/bin/cmd"
}
help: {
steps: [#Load & {
from: build
}]
}
}
output: {
help: {
steps: [#Load & {
from: build
}]
}
build: {
version: *"1.16" | string
source: {
steps: [{
do: "local"
dir: "."
include: []
}]
}
// Packages to build
packages: "./cmd"
// Specify the targeted binary name
output: "/usr/local/bin/cmd"
env: [string]: string
steps: [#Copy & {
from: #Go & {
version: version
"source": source
"env": env
}
src: output
dest: output
}]
}
repository: {
steps: [{
do: "local"
dir: "."
include: []
}]
}
}
#Dir: steps: [...#Op]
// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load
#Local: {
do: "local"
dir: string
include: [...string] | *[]
}
#Load: {
do: "load"
from: _
}
#Exec: {
do: "exec"
args: [...string]
env?: [string]: string
always?: true | *false
dir: string | *"/"
mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}
#Fetch: {
do: "fetch-container"
ref: string
}
#Copy: {
do: "copy"
from: _
src: string | *"/"
dest: string | *"/"
}
#Go: {
// Go version to use
version: *"1.16" | string
// Arguments to the Go binary
args: [...string]
// Source Directory to build
source: #Dir
// Environment variables
env: [string]: string
steps: [
#Fetch & {
ref: "docker.io/golang:\(version)-alpine"
},
#Exec & {
"args": ["go"] + args
"env": env
env: CGO_ENABLED: "0"
dir: "/src"
mount: "/src": from: source
mount: "/root/.cache": "cache"
},
]
}
#Build: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to build
packages: *"." | string
// Specify the targeted binary name
output: string
env: [string]: string
steps: [
#Copy & {
from: #Go & {
"version": version
"source": source
"env": env
args: ["build"]
}
src: output
dest: output
},
]
}
#Test: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to test
packages: *"." | string
#Go & {
"version": version
"source": source
args: ["test", "-v", packages]
}
}
This is sufficient to reproduce the slow evaluation. What's interesting, is that moving all of the fields from output
into base
causes the problem to go away, despite the fact that logically the resulting x
will be the same:
package repro
x: base & output
base: {
repository: #Dir
build: #Build & {
source: repository
packages: "./cmd"
output: "/usr/local/bin/cmd"
}
help: {
steps: [#Load & {
from: build
}]
}
help: {
steps: [#Load & {
from: build
}]
}
build: {
version: *"1.16" | string
source: {
steps: [{
do: "local"
dir: "."
include: []
}]
}
// Packages to build
packages: "./cmd"
// Specify the targeted binary name
output: "/usr/local/bin/cmd"
env: [string]: string
steps: [#Copy & {
from: #Go & {
version: version
"source": source
"env": env
}
src: output
dest: output
}]
}
repository: {
steps: [{
do: "local"
dir: "."
include: []
}]
}
}
output: {
}
#Dir: steps: [...#Op]
// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load
#Local: {
do: "local"
dir: string
include: [...string] | *[]
}
#Load: {
do: "load"
from: _
}
#Exec: {
do: "exec"
args: [...string]
env?: [string]: string
always?: true | *false
dir: string | *"/"
mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}
#Fetch: {
do: "fetch-container"
ref: string
}
#Copy: {
do: "copy"
from: _
src: string | *"/"
dest: string | *"/"
}
#Go: {
// Go version to use
version: *"1.16" | string
// Arguments to the Go binary
args: [...string]
// Source Directory to build
source: #Dir
// Environment variables
env: [string]: string
steps: [
#Fetch & {
ref: "docker.io/golang:\(version)-alpine"
},
#Exec & {
"args": ["go"] + args
"env": env
env: CGO_ENABLED: "0"
dir: "/src"
mount: "/src": from: source
mount: "/root/.cache": "cache"
},
]
}
#Build: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to build
packages: *"." | string
// Specify the targeted binary name
output: string
env: [string]: string
steps: [
#Copy & {
from: #Go & {
"version": version
"source": source
"env": env
args: ["build"]
}
src: output
dest: output
},
]
}
#Test: {
// Go version to use
version: *#Go.version | string
// Source Directory to build
source: #Dir
// Packages to test
packages: *"." | string
#Go & {
"version": version
"source": source
args: ["test", "-v", packages]
}
}
@mpvl perhaps this gives a sufficient steer as to where the problem might lie?
Original reply by @myitcv in https://github.com/cuelang/cue/issues/803#issuecomment-791884270
Further analysis.
TL;DR - there is a CUE problem here; we have a couple of options for fixing it.
main
branch. The following bullet points discuss those various experiments, with evaluation times, diffs of the x
field with respect to the reference value, and links to the relevant diff/code. do
. I don't think this is sensible comparison because nobody is advocating we go down this path; if anything CUE is going to be encouraging people more strongly to add discriminator fields to help with performance. Note the diff in this case is the "wrong way around" - the change to remove the do
fields is on the LHS, as are the timings and any diffs with respect to the reference. Evaluation time: 1m52secs. Link x
from being a single struct value to the unification of two other struct values, base
and output
, with base
containing all the fields/constraints that were previously in x
, and output
empty, the evaluation time jumps ~40%! (MISSING)This should not be happening, because logically the value of x
remains unchanged in this scenario. Evaluation time: 0.111s. Linkhelp
field within output
, then the evaluation time leaps to ~20s. This appears to be the main trigger of the performance blow up. Evaluation time: 20s. Linkhelp
declaration within the same struct then the performance is far better. Evaluation time: 0.1s. Linkrepository
where it is referenced by build
the performance problem also largely disappears. Evaluation time: 0.25s. Linksteps
in build
then the problem entirely disappears, regardless of the split unification. Evaluation time: 0.06s. LinkThe observation in point 5 is, I think, the main trigger in this instance. Despite the fact that we are adding an identical declaration of help
to the output
struct, the evaluation time jumps. There is some intuition as to why this might be an issue: the context of help
and build
are different within the two conjuncts, base
and output
. Marcel has a hunch as to what might be going wrong here too (will let him comment separately).
What's interesting to observe is that specific conditions are required to trigger the problem, as demonstrated by points 6, 7 and 8. i.e. a slight tweak and the problem can "disappear".
Therefore, we've established case 5 as the current best repro of the problem, i.e. the error case (diff with respect to the base case).
For completeness I did a quick analysis over time to see whether there were any commits that stood out:
v0.3.0-alpha6
took 0.2s, but the changes to handling of disjunction and closedness since then have been significantv0.3.0-beta.2
) introduced a severe memory problem, a problem that was fixed in 0e401d66c8e5a7e64c667dfa64004cadca7cb871 (v0.3.0-beta.3
), so during this time we can't infer anythingMarcel will be able to say more precisely, but I'm not convinced there is much benefit to trying to find a single commit as the root cause here. We have some pretty good ideas on the root cause.
To summarise:
I also looked to see whether there is a more immediate term fix we employ here whilst the proper fix (in whatever form that takes) is being worked on. That workaround involved optimising for the fact that the "collected" value is ultimately, for now, only needed in concrete form. However, it looks like that workaround will not, unfortunately, work in this case. The reason being is that the zero value of a list type is the empty list. Hence a round trip to/from a concrete representation is lossy results in @verdverm's original repro failing because we end up trying to unify two fields values, one with a zero-length list (which should in fact be the type) with a non-zero-length list.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/803#issuecomment-792230273
v0.3.0-alpha6 took 0.2s, but the changes to handling of disjunction and closedness since then have been significant Also note that the handling of disjunctions was wrong in alpha6. :) So it is not a fair comparison.
Either way, it it does give an indication of what is possible. It tells me that reintroducing some of the techniques of v0.2.2 should be able to get it does to this 0.1s range
Basically, the current approach filters disjunctions looking at struct properties. The theory there was that there is a limited number of syntactic element so that the number of duplicates converges to a small number quickly, stoping exponential growth, and that the check itself is quick.
This turned out to be a flawed assumption: conjuncts need to be uniqued not only by syntactical element, but also by evaluator, of which there is an unbounded number. So nice try, but it doesn't work.
Rather than refining the duplicate elimination model, the plan is to just brute force the comparison. This adds a constant overhead, but that beats the hell out of exponential slowdown. Also, some of the tricks now can be used to short-circuit the comparison when possible, gaining back some of the speed.
Brute forcing the comparison is a bit involved: v0.3 does not compute arcs for optional fields. This would then be necessary. This chance will also help with the required field and several other features, though.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/803#issuecomment-792234532
Possible workaround:
Note that
include: [...string] | *[]
is equivalent to
include: [...string]
as the default value of [...string]
is []
If I make that substitution, it runs well below a second.
Of course there is no excuse for CUE to become so slow if the more verbose form is used, but hopefully this will help for now.
Original reply by @mpvl in https://github.com/cuelang/cue/issues/803#issuecomment-792239502
To confirm the above hypothesis, this is a small reproducer:
x: #A & #B & #C
#A: #C // same as #C, the references forces a new closedInfo and Environment
#B: #C // same here, repeat pattern to make disjunction in `x` grow exponentially
#C: [...string] | *[]
Because each conjunct of #C
represents a new copy (because the Environment
is different), and because the ...string
in list counts as an optional field, the Equality check is skipped.
Including optional fields in the equality check (as was done in v0.2, well that used subsumption, but same difference), would address this issue. Note that equality cannot be implemented accurately for bulk optional fields. But it can at least be implemented if the bulk optional field originated from the same struct. So by doing so, we achieve the dedup goal of the original v0.3 implementation, because it effectively takes the Environment factor out of the equation.
Note that in this case it would be sufficient: if we rewrite x
as x: #C & #C & #C
, performance is great. This is effectively the performance one would get with the proposed technique (minus the constant overhead).
A completely different approach would be to detect equivalence of Environments. This is complicated though, and is a less general mechanism.
@myitcv do you have a profiling tool to debug this ? Atm, I'm hacking my configs to temporarily fix the issue (as it often happens with values with 2+ possible types): reducing the amount of possible value reduces linearly the eval time. but it requires to manually comment/uncomment each value until I find the node impacting the graph.
I'm not aware of a tool or API functions.
There is however a stats process https://github.com/cue-lang/cue/blob/master/internal/core/adt/context.go#L156
Not sure if it is always active or needs to be turned on. Either way, it most likely requires compiling a modified cue to print out the data
$ grep -Rn stats internal/
internal/core/adt/context.go:156: stats Stats
internal/core/adt/disjunct.go:127: n.ctx.stats.DisjunctCount++
internal/core/adt/eval.go:65:var stats = template.Must(template.New("stats").Parse(`{{"" -}}
internal/core/adt/eval.go:78: _ = stats.Execute(buf, s)
internal/core/adt/eval.go:83: return &c.stats
internal/core/adt/eval.go:207: c.stats.UnifyCount++
internal/core/adt/eval.go:875: c.stats.Reused++
internal/core/adt/eval.go:901: c.stats.Allocs++
internal/core/adt/eval.go:937: n.ctx.stats.Retained++
internal/core/adt/eval.go:959: c.stats.Freed++
internal/core/adt/eval_test.go:69: stats := ctx.Stats()
internal/core/adt/eval_test.go:70: t.Log(stats)
internal/core/adt/eval_test.go:71: // if n := stats.Leaks(); n > 0 {
@grouville improving support for debugging CUE configurations that perform badly is definitely on our radar.
For now I do something along the following lines:
cue eval
(or whatever command is relevant) cycle.Very often a simple cue eval
(or equivalent) is enough for me to find the trigger, e.g. a "recursive" definition. At this point I have a working reproduction. Might not be entirely minimal, but it will be good enough to open a discussion/issue. Worst case, the example I have found is a duplicate of an existing issue; best case, we have a new bug/issue for the evaluation engine!
In some instances I like to see some actual numbers behind why something is performing badly. To do that you can use TestX
in internal/core/adt
. Fill out the template for the in
variable (txtar
format) and then run. e.g.
// TestX is for debugging. Do not delete.
func TestX(t *testing.T) {
in := `
-- cue.mod/module.cue --
module: "example.com"
-- in.cue --
x: 5
`
if strings.HasSuffix(strings.TrimSpace(in), ".cue --") {
t.Skip()
}
....
then run with:
$ go test -run TestX ./internal/core/adt
--- FAIL: TestX (0.00s)
eval_test.go:137: (struct){
x: (int){ 5 }
}
eval_test.go:139: Leaks: 0
Freed: 2
Reused: 0
Allocs: 2
Retain: 0
Unifications: 2
Disjuncts: 2
FAIL
FAIL cuelang.org/go/internal/core/adt 0.010s
FAIL
Sometimes this process can take some time. In those cases, I generally create an issue/discussion in any case, with whatever reproduction I have as a starting point. Follow these tips on how to share multi-file reproducers in a format that others can easily work with. I (or indeed others) can then come back to that at a later date.
We also plan to do some live streaming/screen casts debugging a real-life problem. In the long run such techniques should not be necessary, but for now it seems sensible to optimise for sharing best practice.
Marking as v0.7.0 but also flagging for the CUE team to verify the current status.
Just adding a comment that between v0.5.0 and v0.6.0, I saw the load time for my CUE config increase from ~200ms to ~1100ms. That according to running time
with the vet
command in antler, to load the schema and unify it with a small config.
Since the schema has a number of recursive definitions, I thought I'd mention this here in case it's related. :)
Originally opened by @verdverm in https://github.com/cuelang/cue/issues/803
What version of CUE are you using (
cue version
)?beta.5
What did you do?
Trying to work with Cue from the Go API. The following is extracted from a much more complex application, but captures the essence of what is happening. Some cue is loaded, some values are extracted and merged. There are notes below the
txtar
on how to change just a few lines to vary the performance and time to run.notes:
cue eval repro.cue
has no issue finishing in 0.1s// do: <...>
reduces runtime to a few seconds, but still much slowersteps: []
(an empty list) in#Build
removes the runtime issue all together.would love some help reducing this repro further
What did you expect to see?
consistent performance regardless of the line changes
What did you see instead?
CPU pinned to 100%!a(MISSING)nd growing memory (via htop), program "hangs" at the
merged.Value()
line.delve has show some very strange slice sizes for internal slices (in the original), slices with lengths that would make their mem footprint in the Petabytes! Lots of time is seemingly spent in Go's garbage collector.