concourse / concourse

Concourse is a container-based continuous thing-doer written in Go.
https://concourse-ci.org
Apache License 2.0
7.37k stars 846 forks source link

Random "unexpected end of JSON input" on build tasks #3791

Closed M0nsieurChat closed 4 years ago

M0nsieurChat commented 5 years ago

Hi, first of all, thank you for providing Concourse, it's a great tool !

Bug Report

Sometimes (quite rare, happens about 2 times for 30 builds), tasks are interrupted with the following error message

unexpected end of JSON input Screenshot from 2019-04-25 11-29-25

I suspect a network problem between the workers and concourse but maybe I'll have some interesting insights from the pros here

Steps to Reproduce

It happens on multiple pipelines so I'm not sure the bug would be tied to a specific pipeline YAML definition.

Expected Results

The build should be successful as any other builds of the same pipeline that went without problem

Actual Results

Rarely, the build fails with "unexpected end of JSON input"

Additional Context

We deployed Concourse on Kubernetes with the following chart : https://github.com/helm/charts/tree/master/stable/concourse

Here are the logs of the concourse container scoped to this particular pipeline

$ kubectl logs -f -n concourse concourse-web-688b9cd69c-ljqcg  | grep unexpected | grep build-erl-module
{"timestamp":"2019-04-25T07:07:33.507408898Z","level":"info","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.running-worker.garden-connection.retry-hijackable-client.retrying","data":{"error":"unexpected EOF","failed-attempts":1,"pipeline":"build-erl-modules","ran-for":"44.746689ms","resource":"xxx_web-git-develop","session":"17.13.26.1.632.4.1.2","team":"main"}}
{"timestamp":"2019-04-25T07:07:40.718165521Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-check","data":{"error":"unexpected end of JSON input","pipeline":"build-erl-modules","resource":"xxx_util-git-develop","session":"17.13.23.1.637","team":"main"}}
{"timestamp":"2019-04-25T07:07:40.719556022Z","level":"error","source":"atc","message":"atc.pipelines.radar.failed-to-run-scan-resource","data":{"error":"unexpected end of JSON input","pipeline":"build-erl-modules","session":"17.13","team":"main"}}
{"timestamp":"2019-04-25T07:07:40.767516731Z","level":"error","source":"atc","message":"atc.pipelines.radar.scan-resource.interval-runner.tick.failed-to-check","data":{"error":"unexpected end of JSON input","pipeline":"build-erl-modules","resource":"xxx_web-git-develop","session":"17.13.26.1.632","team":"main"}}
{"timestamp":"2019-04-25T07:07:40.768400499Z","level":"error","source":"atc","message":"atc.pipelines.radar.failed-to-run-scan-resource","data":{"error":"unexpected end of JSON input","pipeline":"build-erl-modules","session":"17.13","team":"main"}}

Version Info

ddadlani commented 5 years ago

Hi @M0nsieurChat, this issue occurs if the ATC reattaches during a build. https://github.com/concourse/concourse/issues/534#issue-165446772

It should get fixed by https://github.com/concourse/rfcs/pull/1

M0nsieurChat commented 5 years ago

Hi, thanks for your quick response. I like it because I won't have to deep dive into the arcanes of networking in order to solve this one as I previously thought :)

Cheers,

M0nsieurChat commented 5 years ago

Hello @ddadlani , We've upgraded to 5.4.1 and still encounter this issue. Do you have an idea since the v2 resources RFC has been merged ?

Thanks a lot,

ddadlani commented 5 years ago

Hi @M0nsieurChat ,

That RFC was only merged because another one was opened in its stead: https://github.com/vito/rfcs/blob/generalized-resources/024-generalized-resources/proposal.md

The RFCs are also only meant to outline proposed changes and allow community discussion on them. They are not tied to the code base. Once discussion has completed on this RFC, dev work will begin on resources v2. As such, v5.4.1 is not supposed to fix this issue. Hope this helps clear it up :)

stale[bot] commented 4 years ago

Beep boop! This issue has been idle for long enough that it's time to check in and see if it's still important.

If it is, what is blocking it? Would anyone be interested in submitting a PR or continuing the discussion to help move things forward?

If no activity is observed within the next week, this issue will be exterminated closed, in accordance with our stale issue process.