ddiss / icyci

Safe and scalable continuous testing, without the bloat
GNU Affero General Public License v3.0
2 stars 1 forks source link

Stale lock after transition from State 5 #3

Closed Werkov closed 1 year ago

Werkov commented 1 year ago

Steps to reproduce

Additional info

Log from the service:

Jun 27 18:37:20 host icyci[80930]: Skipped shutdown
Jun 27 20:36:58 host icyci[80930]: rapido2:/root/build/kselftest/kselftest_install# 2023/06/27 20:36:58 State 5 transition timeout!
- Main PID: 80930 (code=exited, status=1/FAILURE)
- manual restart
Jun 28 15:19:52 host icyci[151323]: 2023/06/28 15:19:52 transitioning from state 0: uninitialized -> 1: clone source repository
Jun 28 15:19:52 host icyci[151331]: Cloning into '/icyci-tmp/icyci-workspace792369250/source'...
Jun 28 15:25:00 host icyci[151323]: 2023/06/28 15:25:00 clone source repository completed successfully
Jun 28 15:25:00 host icyci[151323]: 2023/06/28 15:25:00 transitioning from state 1: clone source repository -> 2: verify branch HEAD
Jun 28 15:25:00 host icyci[151323]: 2023/06/28 15:25:00 GPG verifying commit at origin/icyci-demo tip
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:00 verify completed successfully
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:00 transitioning from state 2: verify branch HEAD -> 3: lock commit for testing
Jun 28 15:25:01 host icyci[151585]: From /icyci-linux-results
Jun 28 15:25:01 host icyci[151585]:  * [new ref]                   refs/notes/icyci.locked -> refs/notes/icyci.locked
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:01 couldn't add git notes lock: exit status 1
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:01 lock commit for testing failed with exit status 1
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:01 transitioning from state 3: lock commit for testing -> 8: poll source for new commits
Jun 28 15:25:01 host icyci[151323]: 2023/06/28 15:25:01 Entering poll loop awaiting new icyci-demo commits at 24a3d9bc1b819c695c7c88e1e6e20b4971f010d9
ddiss commented 1 year ago

Thanks for the report. With -disable-timeouts=false (default), test script completion timeout currently triggers after 2 hours. As a workaround, we could for finer grained tuning of state transition timeouts. Regarding the actual failure on restart, I'll add this scenario to icyci_test.go and evaluate options for a fix.

Werkov commented 1 year ago

Clarification -- the (fixed) timeout is fine IMO. I wanted to provide logs for the issue when the stale lock remains after the termination. (On a related note, timeout could be handled without terminating the main service but cleanup is the crucial step.)

ddiss commented 1 year ago

Clarification -- the (fixed) timeout is fine IMO.

Understood. Although given that awaitCmd state timeout is very dependent on the contents of the test script, it probably makes sense to at least offer a parameter for configuring that specific timeout. git fetch / push timeouts could remain as-is for now.

I wanted to provide logs for the issue when the stale lock remains after the termination. (On a related note, timeout could be handled without terminating the main service but cleanup is the crucial step.)

One simple option would be to just handle any awaitCmd state transition timeouts as regular cmd failures, i.e.

Do you agree that this is a reasonable option? Other state timeouts would continue to be fatal for now.

Werkov commented 1 year ago

Test timeout as a test result (negative, with the report and regular cleanup) is more reasonable than failing the daemon ;-)

ddiss commented 1 year ago

@Werkov I've added some functionality to hopefully improve the situation here:

Do you think this new functionality allows this issue to be closed, or would you like something else implemented here?

Werkov commented 1 year ago

I've tried with b4ab50bbbae189602e85885ead0da7b05cd4ead6 and icyci daemon handled a timeout gracefully and correctly caught up after a later push. Let me close this. Thanks!

ddiss commented 1 year ago

I've tried with b4ab50b and icyci daemon handled a timeout gracefully and correctly caught up after a later push. Let me close this. Thanks!

great, thanks for testing!