concourse / concourse

Concourse is a container-based continuous thing-doer written in Go.
https://concourse-ci.org
Apache License 2.0
7.37k stars 846 forks source link

Seemingly random crash in findOrCreateResourceConfigScope #4546

Closed Nashenas88 closed 4 years ago

Nashenas88 commented 4 years ago

Bug Report

Periodically, concourse is crashing with the following panic (it takes down the entire server):

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x9751bc]

goroutine 19524 [running]:
github.com/concourse/concourse/atc/db.findOrCreateResourceConfigScope(0x275d280, 0xc00164a8d0, 0x2777fc0, 0xc000514a50, 0x26f9c40, 0xc00047b800, 0x274d260, 0xc000f3fc00, 0x2791ac0, 0xc0019bcb00, ...)
    /tmp/build/1c3187db/concourse/atc/db/resource_config.go:296 +0x1c1c
github.com/concourse/concourse/atc/db.(*build).SaveOutput(0xc000c0c1e0, 0xc00129ab10, 0xc, 0xc000ad7b60, 0xc001ba3680, 0x6, 0x8, 0xc001c652f0, 0xc00372a880, 0x2, ...)
    /tmp/build/1c3187db/concourse/atc/db/build.go:884 +0x32d
github.com/concourse/concourse/atc/engine/builder.(*putDelegate).SaveOutput(0xc000d4d310, 0x2763160, 0xc0019f0720, 0xc00129ab10, 0xc, 0xc00129ab08, 0x4, 0xc00129ab0c, 0x4, 0xc000fd6450, ...)
    /tmp/build/1c3187db/concourse/atc/engine/builder/delegate_factory.go:220 +0x35f
github.com/concourse/concourse/atc/exec.(*PutStep).Run(0xc0006d4600, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0x1, 0xc0012c4520)
    /tmp/build/1c3187db/concourse/atc/exec/put_step.go:199 +0xeb6
github.com/concourse/concourse/atc/exec.LogErrorStep.Run(0x270f6a0, 0xc0006d4600, 0x7f6e1975cdc8, 0xc000d4d310, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc0012c2ef0, 0x0)
    /tmp/build/1c3187db/concourse/atc/exec/log_error_step.go:30 +0xe4
github.com/concourse/concourse/atc/exec.OnSuccessStep.Run(0x2718020, 0xc0012befe0, 0x2718020, 0xc0012bf000, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0x0, 0x0)
    /tmp/build/1c3187db/concourse/atc/exec/on_success.go:29 +0x60
github.com/concourse/concourse/atc/exec.EnsureStep.Run(0x2718020, 0xc0012befc0, 0x2718120, 0xc0012bf020, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc000048028, 0x0)
    /tmp/build/1c3187db/concourse/atc/exec/ensure_step.go:44 +0xfa
github.com/concourse/concourse/atc/exec.InParallelStep.Run.func1(0xc0016e8c40, 0xc0030a9f80, 0x2717ee0, 0xc0012bf040, 0x2746be0, 0xc001cc3c00, 0x272ff20, 0xc0012c2ef0, 0xc0012bf060, 0x2, ...)
    /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:61 +0xb4
created by github.com/concourse/concourse/atc/exec.InParallelStep.Run
    /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:56 +0x26e

Steps to Reproduce

It's not yet clear what is causing this to occur

Expected Results

Ideally, concourse wouldn't completely come down

Actual Results

The server comes down completely

Version Info

xtremerui commented 4 years ago

@cirocosta while core is looking into this, is there any similar case happening on Wings or Hush hourse atm?

xtremerui commented 4 years ago

@Nashenas88 is it possible to provide some details of your task and your config while you see this failure? Thx!

cirocosta commented 4 years ago

@xtremerui no restarts so far 🤔

fftorres commented 4 years ago

We are facing the same issue after upgrading to 5.6.0.

concourse-prod-web-8f656f4c-kn6km concourse-prod-web panic: runtime error: invalid memory address or nil pointer dereference
concourse-prod-web-8f656f4c-kn6km concourse-prod-web [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x9751bc]
concourse-prod-web-8f656f4c-kn6km concourse-prod-web
concourse-prod-web-8f656f4c-kn6km concourse-prod-web goroutine 190416 [running]:
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/db.findOrCreateResourceConfigScope(0x275d280, 0xc009e6a850, 0x2777fc0, 0xc000592880, 0x26f9c40, 0xc000221c00, 0x274d260, 0xc00c596c00, 0x2791ac0, 0xc008b04840, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/db/resource_config.go:296 +0x1c1c
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/db.(*build).SaveOutput(0xc007fd7680, 0xc00d6f4c70, 0xc, 0xc01f1f2ff0, 0xc01d701680, 0x6, 0x8, 0xc01831c540, 0xc01a9b4a00, 0x8, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/db/build.go:884 +0x32d
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/engine/builder.(*putDelegate).SaveOutput(0xc0073926e0, 0x2763160, 0xc009b100c0, 0xc00d6f4c70, 0xc, 0xc00d6f4c80, 0xd, 0xc00d6f4c90, 0xc, 0xc00a386270, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/engine/builder/delegate_factory.go:220 +0x35f
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.(*PutStep).Run(0xc0083dca00, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x1, 0xc006912000)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/exec/put_step.go:199 +0xeb6
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.LogErrorStep.Run(0x270f6a0, 0xc0083dca00, 0x7f23a4e41aa8, 0xc0073926e0, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x2, 0xc000550a80)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/exec/log_error_step.go:30 +0xe4
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.OnSuccessStep.Run(0x2718020, 0xc00a781940, 0x2718020, 0xc00a781960, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x522d62, 0x0)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/exec/on_success.go:29 +0x60
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.InParallelStep.Run.func1(0xc0094e7500, 0xc01f2a1f80, 0x2718120, 0xc00a781980, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0xc00a781a00, 0x2, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:61 +0xb4
concourse-prod-web-8f656f4c-kn6km concourse-prod-web created by github.com/concourse/concourse/atc/exec.InParallelStep.Run
concourse-prod-web-8f656f4c-kn6km concourse-prod-web    /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:56 +0x26e
antonu17 commented 4 years ago

We have rolled back from 5.6.0 to 5.4.1 (one we were running before), but 5.4.1 didn't rollback database properly as we started seeing this everywhere: sql: Scan error on column index 16, name "start_time": converting driver.Value type time.Time ("2019-10-04 14:45:50 +0000 UTC") to a int64: invalid syntax

cirocosta commented 4 years ago

Hi @antonu17 ,

As there were changes in the database schema between those two versions, you'll have to manually perform a downgrade - see https://concourse-ci.org/concourse-web.html#downgrading

ps.: you can check that there were changes by looking at atc/db/migrations:

$ git diff v5.4.0..v5.6.0 --stat | grep migrations
 .../migrations/1522178770_add_job_tags.up.go       |    4 +-
 .../migrations/1563997651_users_table.down.sql     |    3 +
 .../migrations/1563997651_users_table.up.sql       |   10 +
 .../migrations/1565800062_create_checks.down.sql   |    3 +
 .../migrations/1565800062_create_checks.up.sql     |   20 +
antonu17 commented 4 years ago

Hi @antonu17 ,

As there were changes in the database schema between those two versions, you'll have to manually perform a downgrade - see https://concourse-ci.org/concourse-web.html#downgrading

ps.: you can check that there were changes by looking at atc/db/migrations:

$ git diff v5.4.0..v5.6.0 --stat | grep migrations
 .../migrations/1522178770_add_job_tags.up.go       |    4 +-
 .../migrations/1563997651_users_table.down.sql     |    3 +
 .../migrations/1563997651_users_table.up.sql       |   10 +
 .../migrations/1565800062_create_checks.down.sql   |    3 +
 .../migrations/1565800062_create_checks.up.sql     |   20 +

I see. Thank you! For some reasons I was sure concourse web is running necessary migrations upon launch for both upgrade and downgrade. 🤷‍♂

cirocosta commented 4 years ago

@xtremerui , there's been this change recently: https://github.com/concourse/concourse/pull/4442

Do you think it could be related? 🤔

thx!

marco-m commented 4 years ago

We are seeing the same with 5.6.0. More than 100 restarts per hour. We are investigating right now.

kmdouglass commented 4 years ago

We found a way to reproduce the bug described by this ticket in Concourse 5.6.0. Briefly:

@cirocosta @xtremerui would a test case that reproduces the problem help with your investigations?

xtremerui commented 4 years ago

@kmdouglass thank you for the detail steps! This definitely helps us to prove some thoughts we had about it. I believe the issue is being taken care of now.

fftorres commented 4 years ago

As an update, we faced this problem when we migrated from 5.4.1 to 5.6.0. We tried to revert back to 5.4.1 but we got some database schema issues. We decide then to re-revert to 5.6.0 and make the rollback properly (rolling back also database migrations). After re-applying the 5.6.0 it's running without any restart caused by the issue reported here.

I remember that after applying 5.6.0 we faced some resource check issues due resources declared with the same name as the resource type, so we needed to re-apply some pipelines with the proper change.

kmdouglass commented 4 years ago

You're welcome @xtremerui . We encountered this problem in production again this morning and made the following additional observations:

So it looks like the issue is very much related to whether the ATC has enabled global resources.

xtremerui commented 4 years ago

@fftorres yes I believe the root cause of this issue is something to do with mis-configure resource and resource types. Thx!

jwntrs commented 4 years ago

I remember that after applying 5.6.0 we faced some resource check issues due resources declared with the same name as the resource type, so we needed to re-apply some pipelines with the proper change.

seems related to https://github.com/concourse/concourse/issues/4599