Closed ijc closed 5 years ago
Please sign your commits following these rules: https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work The easiest way to do this is to amend the last commit:
$ git clone -b "debug-ci-intermittent-fail" git@github.com:ijc/docker-app.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354610128
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f
Amending updates the existing PR. You DO NOT need to open a new one.
Merging #534 into master will increase coverage by
0.09%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #534 +/- ##
==========================================
+ Coverage 69.69% 69.79% +0.09%
==========================================
Files 55 55
Lines 3013 3013
==========================================
+ Hits 2100 2103 +3
+ Misses 633 631 -2
+ Partials 280 279 -1
Impacted Files | Coverage Δ | |
---|---|---|
types/parameters/parameters.go | 96.82% <0%> (+4.76%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 137c8cd...32bc9cb. Read the comment docs.
Thanks @chris-crone but I intended this one just for debugging not merging, I've tagged the title now.
ci.docker.com seems to not be exhibiting this issue any more.
But it looks like https://jenkins.dockerproject.org/job/docker/job/app/job/PR-534/8/consoleFull caught it in the act with a reasonably full set of debug:
--- FAIL: TestRender (2.43s)
--- FAIL: TestRender/envvariables (2.43s)
--- FAIL: TestRender/envvariables/stdout (1.27s)
commands_test.go:71: Output of [/home/ubuntu/workspace/docker_app_PR-534-IJC46AYATBYVRHSO4SXPX7AXYKSMGRTSXPB7VYAXB7INRILONG6A/src/github.com/docker/app/docker-linux app render testdata/render/envvariables/my.dockerapp --parameters-file testdata/render/envvariables/parameters-0.yml --set myapp.command3=bam]:
STDOUT:
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
STDERR:
Rendering "testdata/render/envvariables/my.dockerapp" using format "yaml"
Action: &{Driver:0xc0003055c0 Action:com.docker.app.render}
Installation: &{Claim:{Name:custom-action Revision:01DA9AQHCNAHZ31PN9RFWAVSR0 Created:2019-05-07 14:21:31.925651485 +0000 UTC m=+0.276332285 Modified:2019-05-07 14:21:31.925651485 +0000 UTC m=+0.276332285 Bundle:0xc0000af360 Result:{Message: Action:unknown Status:unknown} Parameters:map[com.docker.app.kubernetes-namespace: com.docker.app.orchestrator: com.docker.app.render-format:yaml com.docker.app.share-registry-creds:false myapp.command1:cat myapp.command2:bar myapp.command3:bam] Files:map[]} Reference:}
START RENDER STDERR:
cnab-run render on instance "custom-action" with formatter "yaml"
BEGIN FORMATTED OUTPUT
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
END FORMATTED OUTPUT
cnab-run render on instance "custom-action" with formatter "yaml"
BEGIN FORMATTED OUTPUT
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
END FORMATTED OUTPUT
END RENDER STDERR
END
Which shows that the stdout of docker app
had the content repeated and that the invocation image was either run twice or had its output duplicated, since:
cnab-run render on instance "custom-action" with formatter "yaml"
BEGIN FORMATTED OUTPUT
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
END FORMATTED OUTPUT
Is repeated twice within a single START RENDER STDERR:
/END RENDER STDERR
pair (so docker app
thinks it did not run it twice AFAICT)
:exclamation: No coverage uploaded for pull request base (
master@8aacca2
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #534 +/- ##
=========================================
Coverage ? 69.79%
=========================================
Files ? 55
Lines ? 3013
Branches ? 0
=========================================
Hits ? 2103
Misses ? 631
Partials ? 279
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8aacca2...8ae76d8. Read the comment docs.
From the logging in https://ci.docker.com/teams-dsg/job/dsg/job/app/job/PR-534/85/consoleFull I am reasonably convinced that the invocation is not printing the rendered output twice, but rather it is being duplicated somewhere further up the stack.
This is because in the block:
2019-05-08 16:22:25.854527012 +0000 UTC m=+1.099201895: START RENDER STDERR:
...
2019-05-08 16:22:25.854573641 +0000 UTC m=+1.099248499: END RENDER STDERR
(where the ...
placeholder is for the output of the invocation images stderr as captured by docker app
) the content is repeated twice, but crucially the timestamp is the same on both copies. I believe that if we were somehow calling the invocation image twice, or it was somehow restarting etc that the timestamps would differ on the two copies.
The repeated output is:
2019-05-08 16:22:25.480445502 +0000 UTC m=+0.020324374: cnab-run render on instance "custom-action" with formatter "json"
2019-05-08 16:22:25.489309613 +0000 UTC m=+0.029188480: BEGIN FORMATTED OUTPUT
{
"networks": {
"back": {
"ipam": {},
"external": false
},
"front": {
"ipam": {},
"external": false
}
},
"services": {
"api": {
"build": {},
"credential_spec": {},
"deploy": {
"resources": {},
"placement": {}
},
"image": "python:3.6",
"networks": {
"back": null,
"front": {
"aliases": [
"api.example.com",
"example.com"
]
}
}
},
"db": {
"build": {},
"credential_spec": {},
"deploy": {
"resources": {},
"placement": {}
},
"image": "postgres:9.3",
"networks": {
"back": null
}
},
"web": {
"build": {},
"credential_spec": {},
"deploy": {
"resources": {},
"placement": {}
},
"image": "nginx:latest",
"networks": {
"front": null
},
"ports": [
{
"mode": "ingress",
"target": 80,
"published": 8082,
"protocol": "tcp"
}
],
"volumes": [
{
"type": "volume",
"source": "static",
"target": "/opt/data/static"
}
]
}
},
"version": "3.6",
"volumes": {
"static": {
"name": "corp/web-static-data",
"external": true
}
}
}
2019-05-08 16:22:25.489329497 +0000 UTC m=+0.029208366: END FORMATTED OUTPUT
Unfortunately while the tweaks I've made to make this happen more reliably in CI (but still not totally reliably) have helped there I still cannot reproduce this locally at all (otherwise strace
would be my next port of call, actually it would have been my port of call long before this...).
Some more results from https://ci.docker.com/teams-dsg/job/dsg/job/app/job/PR-534/91/ where the docker-app
(but not the invoc image) are run under strace. I there I see (reconstructing from the strace files manually following the goroutine around):
pid 10887:
write(5, "POST /v1.39/containers/218862fedaaf3eddc9b8a6897f002ef01cbc5b8c05ec3eab28de643c00c63fd9/attach?logs=1&stderr=1&stdout=1&stream=1 HTTP/1.1\r\nHost: \r\nUser-Agent: Docker-Client/unknown-version (linux)\r\nContent-Length: 0\r\nConnection: Upgrade\r\nContent-Type: text/plain\r\nUpgrade: tcp\r\n\r\n", 280) = 280
read(5, 0xc00020c000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
pid 10882:
read(5, "HTTP/1.1 101 UPGRADED\r\nContent-Type: application/vnd.docker.raw-stream\r\nConnection: Upgrade\r\nUpgrade: tcp\r\n\r\n", 4096) = 109
pid 10887:
read(5, "\2\0\0\0\0\0\0z2019-05-09 12:51:35.241236856 +0000 UTC m=+0.021396529: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n\1\0\0\0\0\0\0\17version: \"3.4\"\n\1\0\0\0\0\0\0\nservices:\n\1\0\0\0\0\0\0\10 test:\n\1\0\0\0\0\0\0\r command:\n\1\0\0\0\0\0\0\v - bash\n\1\0\0\0\0\0\0\t - -c\n\1\0\0\0\0\0\0\22 - cat bar bam\n\1\0\0\0\0\0\0\31 image: alpine:latest\n\2\0\0\0\0\0\0O2019-05-09 12:51:35.244718411 +0000 UTC m=+0.024878076: BEGIN FORMATTED OUTPUT\n\2\0\0\0\0\0\0\17version: \"3.4\"\n\2\0\0\0\0\0\0\nservices:\n\2\0\0\0\0\0\0\10 test:\n\2\0\0\0\0\0\0\r command:\n\2\0\0\0\0\0\0\v - bash\n\2\0\0\0\0\0\0\t - -c\n\2\0\0\0\0\0\0\22 - cat bar bam\n", 32777) = 530
write(1, "version: \"3.4\"\n", 15) = 15
write(1, "services:\n", 10) = 10
write(1, " test:\n", 8) = 8
write(1, " command:\n", 13) = 13
write(1, " - bash\n", 11) = 11
write(1, " - -c\n", 9) = 9
write(1, " - cat bar bam\n", 18) = 18
write(1, " image: alpine:latest\n", 25) = 25
read(5, "\2\0\0\0\0\0\0\31 image: alpine:latest\n\2\0\0\0\0\0\0\1\n\2\0\0\0\0\0\0M2019-05-09 12:51:35.244738373 +0000 UTC m=+0.024898017: END FORMATTED OUTPUT\n", 32777) = 127
read(5, 0xc000210000, 32777) = -1 EAGAIN (Resource temporarily unavailable)
...
read(5, "\2\0\0\0\0\0\1\2042019-05-09 12:51:35.241236856 +0000 UTC m=+0.021396529: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n2019-05-09 12:51:35.244718411 +0000 UTC m=+0.024878076: BEGIN FORMATTED OUTPUT\nversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n2019-05-09 12:51:35.244738373 +0000 UTC m=+0.024898017: END FORMATTED OUTPUT\n", 32777) = 396
read(5, "\1\0\0\0\0\0\0mversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 32777) = 117
write(1, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
read(5, 0xc000210000, 32777) = -1 EAGAIN (Resource temporarily unavailable)
fd 5 is a socket to the daemon and the attach protocol is described at https://github.com/moby/moby/blob/master/client/container_attach.go#L10...L33, essentially \1
is stdout
and \2
is stderr.
So we can see that docker-app
is being given two copies of the output with \1
, first it comes line by line (and is repeated to stdout==fd1 that way) and then it comes in one block (and it is repeated that way).
For comparison a successful run (done locally) looks like:
pid 23173:
write(5, "POST /v1.39/containers/bb60c01f4cd6a9597310103caed18ea5830da2c8326728db8bdf9735178c4e53/attach?logs=1&stderr=1&stdout=1&stream=1 HTTP/1.1\r\nHost: \r\nUser-Agent: Docker-Client/unknown-version (linux)\r\nContent-Len
gth: 0\r\nConnection: Upgrade\r\nContent-Type: text/plain\r\nUpgrade: tcp\r\n\r\n", 280) = 280
read(5, 0xc0004d0000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
pid 23170:
read(5, "HTTP/1.1 101 UPGRADED\r\nContent-Type: application/vnd.docker.raw-stream\r\nConnection: Upgrade\r\nUpgrade: tcp\r\n\r\n", 4096) = 109
pid 23175:
read(5, "\2\0\0\0\0\0\0z2019-05-09 13:20:48.071237516 +0000 UTC m=+0.024453547: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n", 32777) = 130
read(5, 0xc00054a000, 32777) = -1 EAGAIN (Resource temporarily unavailable)
pid 23173:
read(5, "\2\0\0\0\0\0\1\t2019-05-09 13:20:48.076275429 +0000 UTC m=+0.029491459: BEGIN FORMATTED OUTPUT\nversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n2019-05-09 13:20:48.07629754 +0000 UTC m=+0.029513546: END FORMATTED OUTPUT\n\1\0\0\0\0\0\0mversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 32777) = 390
write(1, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
read(5, 0xc00054a000, 32777) = -1 EAGAIN (Resource temporarily unavailable)
So it seems to get the "all one line" variant which the bad case saw second.
So the info is already duplicated before docker app
sees it from the daemon.
Managed to capture strace
of both docker-app
and the cnab-run
within the invocation image in https://jenkins.dockerproject.org/job/docker/job/app/job/PR-534/72/.
From the invocation image traces:
cnab-run.render.7:write(2, "2019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n", 122) = 122
cnab-run.render.7:write(2, "2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT\n", 79) = 79
cnab-run.render.13:write(2, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n", 110) = 110
cnab-run.render.13:write(2, "2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT\n", 77) = 77
cnab-run.render.13:write(1, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
and:
$ grep write\( cnab-run.render.*
cnab-run.render.13:write(2, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n", 110) = 110
cnab-run.render.13:write(2, "2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT\n", 77) = 77
cnab-run.render.13:write(1, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
cnab-run.render.7:write(2, "2019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n", 122) = 122
cnab-run.render.7:write(2, "2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT\n", 79) = 79
$ grep cat.bar.bam cnab-run.render.*
cnab-run.render.13:write(2, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n", 110) = 110
cnab-run.render.13:write(1, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
Which validates that it is only printing its output to stdout a single time (and that the stderr output is only produced once too).
However reconstructing the trace from docker-app
shows:
file.12568:write(6, "POST /v1.39/containers/0a705722ef45400bd50bc219b33e0e7a321959717a4e3fb11fdb380148f08be4/attach?logs=1&stderr=1&stdout=1&stream=1 HTTP/1.1\r\nHost: \r\nUser-Agent: Docker-Client/unknown-version (linux)\r\nContent-Length: 0\r\nConnection: Upgrade\r\nContent-Type: text/plain\r\nUpgrade: tcp\r\n\r\n", 280) = 280
file.12568:read(6, "HTTP/1.1 101 UPGRADED\r\nContent-Type: application/vnd.docker.raw-stream\r\nConnection: Upgrade\r\nUpgrade: tcp\r\n\r\n", 4096) = 109
file.12566:read(6, "\2\0\0\0\0\0\0z2019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n\2\0\0\0\0\0\0O2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT\n\2\0\0\0\0\0\0\17version: \"3.4\"\n\2\0\0\0\0\0\0\nservices:\n\2\0\0\0\0\0\0\10 test:\n\2\0\0\0\0\0\0\r command:\n\2\0\0\0\0\0\0\v - bash\n\2\0\0\0\0\0\0\t - -c\n\2\0\0\0\0\0\0\22 - cat bar bam\n\2\0\0\0\0\0\0\31 image: alpine:latest\n\2\0\0\0\0\0\0\1\n\2\0\0\0\0\0\0M2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT\n\1\0\0\0\0\0\0\17version: \"3.4\"\n\1\0\0\0\0\0\0\nservices:\n\1\0\0\0\0\0\0\10 test:\n\1\0\0\0\0\0\0\r command:\n\1\0\0\0\0\0\0\v - bash\n\1\0\0\0\0\0\0\t - -c\n\1\0\0\0\0\0\0\22 - cat bar bam\n\1\0\0\0\0\0\0\31 image: alpine:latest\n", 32777) = 657
file.12566:write(5, "version: \"3.4\"\n", 15) = 15
file.12566:write(5, "services:\n", 10) = 10
file.12566:write(5, " test:\n", 8) = 8
file.12566:write(5, " command:\n", 13) = 13
file.12566:write(5, " - bash\n", 11) = 11
file.12566:write(5, " - -c\n", 9) = 9
file.12566:write(5, " - cat bar bam\n", 18) = 18
file.12566:write(5, " image: alpine:latest\n", 25) = 25
file.12566:read(6, "\1\0\0\0\0\0\0mversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\2\0\0\0\0\0\1\2042019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance \"custom-action\" with formatter \"yaml\"\n2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT\nversion: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n\n2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT\n", 32777) = 513
file.12566:write(5, "version: \"3.4\"\nservices:\n test:\n command:\n - bash\n - -c\n - cat bar bam\n image: alpine:latest\n", 109) = 109
Which is clearly reading the stdout content from the connection to the daemon (which is on fd 6) twice (first time it is split on \n
then the second time it is all in one line, which is what the invocation image actually wrote).
BTW, fd 5 is the output file:
file.12566:openat(AT_FDCWD, "/tmp/-693957471/actual.yaml", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 5
So it seems that something between the container's stdout/stderr and the daemon rest endpoint has not only duplicated both of them but has also apparently for the first one split it on \n
somewhere along the way.
From earlier tracing of the happy case with no duplication we know that the expected one is the unsplit one, so it is the first of the two copies which is the "extra".
The actual test output for this case:
--- FAIL: TestRender/envvariables/file (3.54s)
commands_test.go:98: Output of [strace -s 4096 -fff -o /home/ubuntu/workspace/docker_app_PR-534-IJC46AYATBYVRHSO4SXPX7AXYKSMGRTSXPB7VYAXB7INRILONG6A/src/github.com/docker/app/_build/strace078002921/file /home/ubuntu/workspace/docker_app_PR-534-IJC46AYATBYVRHSO4SXPX7AXYKSMGRTSXPB7VYAXB7INRILONG6A/src/github.com/docker/app/docker-linux app render testdata/render/envvariables/my.dockerapp --parameters-file testdata/render/envvariables/parameters-0.yml --set myapp.command3=bam --output=/tmp/-693957471/actual.yaml]:
STDOUT:
STDERR:
2019-05-10 10:15:31.132875926 +0000 UTC m=+0.093232854
Rendering "testdata/render/envvariables/my.dockerapp" using format "yaml"
Action: &{Driver:0xc0000876e0 Action:com.docker.app.render}
Installation: &{Claim:{Name:custom-action Revision:01DAGKV7HW3MWF6H6CPM0BG7CM Created:2019-05-10 10:15:31.132543203 +0000 UTC m=+0.092900145 Modified:2019-05-10 10:15:31.132543203 +0000 UTC m=+0.092900145 Bundle:0xc0000cb7c0 Result:{Message: Action:unknown Status:unknown} Parameters:map[com.docker.app.kubernetes-namespace: com.docker.app.orchestrator: com.docker.app.render-format:yaml com.docker.app.share-registry-creds:false myapp.command1:cat myapp.command2:bar myapp.command3:bam] Files:map[]} Reference:}
Stracing invoc image to "/home/ubuntu/workspace/docker_app_PR-534-IJC46AYATBYVRHSO4SXPX7AXYKSMGRTSXPB7VYAXB7INRILONG6A/src/github.com/docker/app/_build/strace078002921"
2019-05-10 10:15:34.505983334 +0000 UTC m=+3.466340242: START RENDER STDERR:
2019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance "custom-action" with formatter "yaml"
2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT
2019-05-10 10:15:31.968134272 +0000 UTC m=+0.045578545: cnab-run render on instance "custom-action" with formatter "yaml"
2019-05-10 10:15:31.974418925 +0000 UTC m=+0.051863204: BEGIN FORMATTED OUTPUT
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
2019-05-10 10:15:31.975416038 +0000 UTC m=+0.052860297: END FORMATTED OUTPUT
2019-05-10 10:15:34.50629019 +0000 UTC m=+3.466647135: END RENDER STDERR
END
commands_test.go:100: Content of /tmp/-693957471/actual.yaml:
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
version: "3.4"
services:
test:
command:
- bash
- -c
- cat bar bam
image: alpine:latest
END
commands_test.go:101: failed to parse source file: /go/src/github.com/docker/app/e2e/commands_test.go: open /go/src/github.com/docker/app/e2e/commands_test.go: no such file or directory
commands_test.go:101: assertion failed:
--- ←
+++ →
@@ -1,2 +1,10 @@
+version: "3.4"
+services:
+ test:
+ command:
+ - bash
+ - -c
+ - cat bar bam
+ image: alpine:latest
version: "3.4"
services:
: rendering mismatch
The issue seems to be fixed on the CI. We don't have flaky builds anymore.
This is investigating the intermittent CI failure originally spotted in #529.