bluesky / ophyd-async

Hardware abstraction for bluesky written using asyncio
https://blueskyproject.io/ophyd-async
BSD 3-Clause "New" or "Revised" License
12 stars 26 forks source link

Provide historical summary of tests and PR comment on failed tests #567

Open DiamondJoseph opened 2 months ago

DiamondJoseph commented 2 months ago

Makes tracking e.g. #23 easier, may enable catching other issues?

See the summary page of this PR's Checks section.

Provides table of test behaviour over the previous N (default 10) runs. Untested but should also comment on any PRs with failed tests: example here from the action itself.

I'd prefer if it was a configured github action rather than running npx <some command>...

I'd like to try this out here and on some other repos for a Diamond Copier issue

callumforrester commented 2 months ago

Should the bot be commenting here?

DiamondJoseph commented 2 months ago

Should the bot be commenting here?

It can be configured to. Currently set to only comment if there are test failures, just to keep things slightly more foccussed: I wouldn't personally want a bot comment for every pushed commit.

coretl commented 2 months ago

duration <1s?

DiamondJoseph commented 2 months ago

duration <1s?

Github reckons 36s, so that part of the table definitely doesn't look right.

And 440 tests, 1 skipped vs. 309/1 (do we have 131 paramterised tests options and they count them separate?).

I'll look again at the ctrf export

jwlodek commented 2 months ago

Pretty cool! Are there outlined criteria for describing a test as flaky? I.e. 2 failures out of 5 runs or something similar? Likewise, can it distinguish between temporarily broken tests on a dev PR and actually flaky tests?

Also, not sure if possible, but would be cool if we have tests we know are flaky that fail to have the CI either ignore them or try running again so we don't need to merge PRs with failing CI because of a flaky test.

DiamondJoseph commented 2 months ago

I'd actively resist making the CI more resistant to flaky tests: if we're not being annoyed we won't prioritise fixing them. This is intended (to my mind) more as something for maintainers to monitor and if we see something become flaky making time to fix it.

DiamondJoseph commented 2 months ago

No idea how it decides what is what, hopefully we can find out :D

coretl commented 1 month ago

@DiamondJoseph where did we get to on this?

DiamondJoseph commented 1 month ago

I've just switched it to always report on test status, there may be insight in historical test data there that might be missed on just the fails(?) and rebased it. I think we have to decide whether to enable or not and then potentially revisit after it's been enabled for a while.

github-actions[bot] commented 1 month ago

Test Summary - Run #2273

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
368 367 0 1 0 0 0 <1s

🎉 All tests passed!

Github Actions Test Reporter CTRF

github-actions[bot] commented 1 month ago

Test Summary - Run #2273

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
368 367 0 1 0 0 0 <1s

🎉 All tests passed!

Github Actions Test Reporter CTRF

github-actions[bot] commented 1 month ago

Test Summary - Run #2273

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
368 367 0 1 0 0 0 <1s

🎉 All tests passed!

Github Actions Test Reporter CTRF

DiamondJoseph commented 1 month ago

Hmm definitely not a fan of putting it in at the moment with 3 reports like that