Closed Nashenas88 closed 4 years ago
@cirocosta while core is looking into this, is there any similar case happening on Wings or Hush hourse atm?
@Nashenas88 is it possible to provide some details of your task and your config while you see this failure? Thx!
@xtremerui no restarts so far 🤔
We are facing the same issue after upgrading to 5.6.0.
concourse-prod-web-8f656f4c-kn6km concourse-prod-web panic: runtime error: invalid memory address or nil pointer dereference
concourse-prod-web-8f656f4c-kn6km concourse-prod-web [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x9751bc]
concourse-prod-web-8f656f4c-kn6km concourse-prod-web
concourse-prod-web-8f656f4c-kn6km concourse-prod-web goroutine 190416 [running]:
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/db.findOrCreateResourceConfigScope(0x275d280, 0xc009e6a850, 0x2777fc0, 0xc000592880, 0x26f9c40, 0xc000221c00, 0x274d260, 0xc00c596c00, 0x2791ac0, 0xc008b04840, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/db/resource_config.go:296 +0x1c1c
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/db.(*build).SaveOutput(0xc007fd7680, 0xc00d6f4c70, 0xc, 0xc01f1f2ff0, 0xc01d701680, 0x6, 0x8, 0xc01831c540, 0xc01a9b4a00, 0x8, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/db/build.go:884 +0x32d
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/engine/builder.(*putDelegate).SaveOutput(0xc0073926e0, 0x2763160, 0xc009b100c0, 0xc00d6f4c70, 0xc, 0xc00d6f4c80, 0xd, 0xc00d6f4c90, 0xc, 0xc00a386270, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/engine/builder/delegate_factory.go:220 +0x35f
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.(*PutStep).Run(0xc0083dca00, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x1, 0xc006912000)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/exec/put_step.go:199 +0xeb6
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.LogErrorStep.Run(0x270f6a0, 0xc0083dca00, 0x7f23a4e41aa8, 0xc0073926e0, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x2, 0xc000550a80)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/exec/log_error_step.go:30 +0xe4
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.OnSuccessStep.Run(0x2718020, 0xc00a781940, 0x2718020, 0xc00a781960, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0x522d62, 0x0)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/exec/on_success.go:29 +0x60
concourse-prod-web-8f656f4c-kn6km concourse-prod-web github.com/concourse/concourse/atc/exec.InParallelStep.Run.func1(0xc0094e7500, 0xc01f2a1f80, 0x2718120, 0xc00a781980, 0x2746be0, 0xc0108e2ac0, 0x272ff20, 0xc00351cb80, 0xc00a781a00, 0x2, ...)
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:61 +0xb4
concourse-prod-web-8f656f4c-kn6km concourse-prod-web created by github.com/concourse/concourse/atc/exec.InParallelStep.Run
concourse-prod-web-8f656f4c-kn6km concourse-prod-web /tmp/build/1c3187db/concourse/atc/exec/in_parallel.go:56 +0x26e
We have rolled back from 5.6.0 to 5.4.1 (one we were running before), but 5.4.1 didn't rollback database properly as we started seeing this everywhere: sql: Scan error on column index 16, name "start_time": converting driver.Value type time.Time ("2019-10-04 14:45:50 +0000 UTC") to a int64: invalid syntax
Hi @antonu17 ,
As there were changes in the database schema between those two versions, you'll have to manually perform a downgrade - see https://concourse-ci.org/concourse-web.html#downgrading
ps.: you can check that there were changes by looking at atc/db/migrations
:
$ git diff v5.4.0..v5.6.0 --stat | grep migrations
.../migrations/1522178770_add_job_tags.up.go | 4 +-
.../migrations/1563997651_users_table.down.sql | 3 +
.../migrations/1563997651_users_table.up.sql | 10 +
.../migrations/1565800062_create_checks.down.sql | 3 +
.../migrations/1565800062_create_checks.up.sql | 20 +
Hi @antonu17 ,
As there were changes in the database schema between those two versions, you'll have to manually perform a downgrade - see https://concourse-ci.org/concourse-web.html#downgrading
ps.: you can check that there were changes by looking at
atc/db/migrations
:$ git diff v5.4.0..v5.6.0 --stat | grep migrations .../migrations/1522178770_add_job_tags.up.go | 4 +- .../migrations/1563997651_users_table.down.sql | 3 + .../migrations/1563997651_users_table.up.sql | 10 + .../migrations/1565800062_create_checks.down.sql | 3 + .../migrations/1565800062_create_checks.up.sql | 20 +
I see. Thank you! For some reasons I was sure concourse web
is running necessary migrations upon launch for both upgrade and downgrade. 🤷♂
@xtremerui , there's been this change recently: https://github.com/concourse/concourse/pull/4442
Do you think it could be related? 🤔
thx!
We are seeing the same with 5.6.0. More than 100 restarts per hour. We are investigating right now.
We found a way to reproduce the bug described by this ticket in Concourse 5.6.0. Briefly:
put
step in the jobput
step is run, change the type
field of the resource_type
in the pipeline configuration for the resource and set the new pipeline configuration with fly set-pipeline
findOrCreateResourceConfigScope
@cirocosta @xtremerui would a test case that reproduces the problem help with your investigations?
@kmdouglass thank you for the detail steps! This definitely helps us to prove some thoughts we had about it. I believe the issue is being taken care of now.
As an update, we faced this problem when we migrated from 5.4.1 to 5.6.0. We tried to revert back to 5.4.1 but we got some database schema issues. We decide then to re-revert to 5.6.0 and make the rollback properly (rolling back also database migrations). After re-applying the 5.6.0 it's running without any restart caused by the issue reported here.
I remember that after applying 5.6.0 we faced some resource check issues due resources declared with the same name as the resource type, so we needed to re-apply some pipelines with the proper change.
You're welcome @xtremerui . We encountered this problem in production again this morning and made the following additional observations:
global resources
enabledglobal resources
actually fixed the problemSo it looks like the issue is very much related to whether the ATC has enabled global resources
.
@fftorres yes I believe the root cause of this issue is something to do with mis-configure resource and resource types. Thx!
I remember that after applying 5.6.0 we faced some resource check issues due resources declared with the same name as the resource type, so we needed to re-apply some pipelines with the proper change.
seems related to https://github.com/concourse/concourse/issues/4599
Bug Report
Periodically, concourse is crashing with the following panic (it takes down the entire server):
Steps to Reproduce
It's not yet clear what is causing this to occur
Expected Results
Ideally, concourse wouldn't completely come down
Actual Results
The server comes down completely
Version Info