canonical / data-platform-workflows

Reusable GitHub Actions workflows used by the Data Platform team
Apache License 2.0
3 stars 9 forks source link

patch(integration_test_charm.yaml): Add step timeouts to upload Allure results on timeout #179

Closed carlcsaposs-canonical closed 1 month ago

carlcsaposs-canonical commented 1 month ago

Currently, if the workflow times out, Allure results will not be uploaded. Add step-level timeouts so the integration test step times out before the workflow times out

For tests that completed before the timeout, they will now show in Allure Report

Tests that did not complete before the timeout will continue to be omitted from the Allure Report

Bonus: we'll also get juju debug-log on integration test timeouts now

carlcsaposs-canonical commented 1 month ago

testing on https://github.com/carlcsaposs-canonical/postgresql-operator/actions/runs/9347452133

carlcsaposs-canonical commented 1 month ago

this helps with tests that finished before the timeout, but doesn't show tests that didn't finish before the timeout

potential alternative: generate test plan? https://allurereport.org/docs/pytest/#select-tests-via-a-test-plan-file

carlcsaposs-canonical commented 1 month ago

another potential alternative: add timeouts to each test so that each module < 120 min. for example, with https://github.com/pytest-dev/pytest-timeout

carlcsaposs-canonical commented 1 month ago

from trying test plan, it appears that doesn't help with allure-pytest if pytest is killed

(report only shows completed tests)

carlcsaposs-canonical commented 1 month ago

tried alternative with timeout command—doesn't work with SIGTERM (https://docs.pytest.org/en/7.1.x/explanation/fixtures.html#a-note-about-fixture-cleanup), kinda works with SIGINT—we get allure results for the interrupted test, but it says passed. however, later tests are not included

conclusion: timeout command is worse than just github step timeout, since it will show last test as passed instead of timed out

however, killing pytest can result in corrupted allure file (partially written JSON file that's still open & not valid JSON), which might cause problems

carlcsaposs-canonical commented 1 month ago

I think we might be able to generate an allure report during "collect integration tests" job with --collect-only (need to disable this or use setup-only and prevent fixture setup or something) and then replace the status with "unknown", then after tests run, delete any unknown results that we have results for

for reference, two runs of same test (historyId is constant): (postgres vm replication group 3)

{
  "name": "test_cluster_isolation",
  "status": "passed",
  "description": "Test for cluster data isolation.\n\n    This test creates a new cluster, create a new table on both cluster, write a single record with\n    the application name for each cluster, retrieve and compare these records, asserting they are\n    not the same.\n    ",
  "attachments": [
    {
      "name": "log",
      "source": "38199748-30fa-4c19-b57b-979b732f7e60-attachment.txt",
      "type": "text/plain"
    }
  ],
  "start": 1717291745285,
  "stop": 1717292379636,
  "uuid": "1e782671-a73d-4bfb-af2d-ac1c288d79fd",
  "historyId": "135d9251eb347937e465a64ddaf59612",
  "testCaseId": "135d9251eb347937e465a64ddaf59612",
  "fullName": "tests.integration.high_availability.test_replication#test_cluster_isolation",
  "labels": [
    {
      "name": "tag",
      "value": "group(3)"
    },
    {
      "name": "tag",
      "value": "asyncio"
    },
    {
      "name": "parentSuite",
      "value": "tests.integration.high_availability"
    },
    {
      "name": "suite",
      "value": "test_replication"
    },
    {
      "name": "host",
      "value": "fv-az842-876"
    },
    {
      "name": "thread",
      "value": "8734-MainThread"
    },
    {
      "name": "framework",
      "value": "pytest"
    },
    {
      "name": "language",
      "value": "cpython3"
    },
    {
      "name": "package",
      "value": "tests.integration.high_availability.test_replication"
    }
  ]
}
{
  "name": "test_cluster_isolation",
  "status": "passed",
  "description": "Test for cluster data isolation.\n\n    This test creates a new cluster, create a new table on both cluster, write a single record with\n    the application name for each cluster, retrieve and compare these records, asserting they are\n    not the same.\n    ",
  "attachments": [
    {
      "name": "log",
      "source": "38199748-30fa-4c19-b57b-979b732f7e60-attachment.txt",
      "type": "text/plain"
    }
  ],
  "start": 1717291745285,
  "stop": 1717292379636,
  "uuid": "1e782671-a73d-4bfb-af2d-ac1c288d79fd",
  "historyId": "135d9251eb347937e465a64ddaf59612",
  "testCaseId": "135d9251eb347937e465a64ddaf59612",
  "fullName": "tests.integration.high_availability.test_replication#test_cluster_isolation",
  "labels": [
    {
      "name": "tag",
      "value": "group(3)"
    },
    {
      "name": "tag",
      "value": "asyncio"
    },
    {
      "name": "parentSuite",
      "value": "tests.integration.high_availability"
    },
    {
      "name": "suite",
      "value": "test_replication"
    },
    {
      "name": "host",
      "value": "fv-az842-876"
    },
    {
      "name": "thread",
      "value": "8734-MainThread"
    },
    {
      "name": "framework",
      "value": "pytest"
    },
    {
      "name": "language",
      "value": "cpython3"
    },
    {
      "name": "package",
      "value": "tests.integration.high_availability.test_replication"
    }
  ]
}