ankidroid / Anki-Android

AnkiDroid: Anki flashcards on Android. Your secret trick to achieve superhuman information retention.
GNU General Public License v3.0
7.94k stars 2.16k forks source link

fix: handle backend-col already open to fix CI failure #16264

Closed criticalAY closed 2 weeks ago

criticalAY commented 2 weeks ago

Purpose / Description

CI fails giving col already open

Many solutions were tried the only thing that worked for the moment was a revert of the commit that started it

A real solution will involve:

Fixes

How Has This Been Tested?

Needs testing on CI

Checklist

Please, go through these checks before submitting the PR.

criticalAY commented 2 weeks ago

All the force push are test that passed everytime, so I think it fixes the issue /hopefully/

david-allison commented 2 weeks ago

I'll rerun a few times, then let's get this in tonight

lukstbit commented 2 weeks ago

And it failed...

david-allison commented 2 weeks ago

My suspicion: see if you can make https://github.com/ankidroid/Anki-Android/blob/main/AnkiDroid%2Fsrc%2Ftest%2Fjava%2Fcom%2Fichi2%2Fanki%2FWhiteboardDefaultForegroundColorTest.kt

Line 37 into a "runTest"

criticalAY commented 2 weeks ago

We'll still need a few re-runs to check if it's fixed

david-allison commented 2 weeks ago

I'd rather not delete the test and use a runTest instead (or an ignore at worst)

mikehardy commented 2 weeks ago

Not a fan of "hey a test isn't working let's just delete the test" style

mikehardy commented 2 weeks ago

Hmm - still failed on windows after marking that one flaky, but we don't have the build logs from the failure yet (#16255 is in queue not merged yet...) so I want to:

(possibly, as David mentions, making sure that Reviewer uses runTest so it has the correct runner / dispatcher config ?)

mikehardy commented 2 weeks ago

I have committed a non-merge-able change to the unit test YAML that runs it on windows only, 10 iterations

Since the required ubuntu and macos checks won't run, this will never meet merge status checks, but we should see windows unit test failures (and associated logs to use to fix this problem...) every single time we try.

mikehardy commented 2 weeks ago

Only hit 1 out of 10, but... it was enough.

I'm able to see the problem in the logs after unzipping, with command strings output.bin > output.log && cat -n output.log |grep BackendCollectionAlreadyOpenException - that gets me the line number of the exception that is currently causing problems

Then you can cat -n output.log and use / plus the line number to seek into the log for the trace.

I see this:


264629  at kotlinx.coroutines.BuildersKt.launch$default(Unknown Source)
264630  at com.ichi2.anki.Reviewer.addFlags(Reviewer.kt:660)
264631  at com.ichi2.anki.Reviewer.onCreateOptionsMenu(Reviewer.kt:677)
264632  at android.app.Activity.$$robo$$android_app_Activity$onCreatePanelMenu(Activity.java:4343)
264633  at android.app.Activity.onCreatePanelMenu(Activity.java)
264634  at androidx.activity.ComponentActivity.onCreatePanelMenu(ComponentActivity.java:520)
264635  at androidx.appcompat.view.WindowCallbackWrapper.onCreatePanelMenu(WindowCallbackWrapper.java:94)
264636  at androidx.appcompat.app.AppCompatDelegateImpl$AppCompatWindowCallback.onCreatePanelMenu(AppCompatDelegateImpl.java:3442)
264637  at androidx.appcompat.app.ToolbarActionBar.populateOptionsMenu(ToolbarActionBar.java:458)
264638  at androidx.appcompat.app.ToolbarActionBar$1.run(ToolbarActionBar.java:58)
264639  at android.os.Handler.$$robo$$android_os_Handler$handleCallback(Handler.java:942)
264640  at android.os.Handler.handleCallback(Handler.java)
264641  at android.os.Handler.$$robo$$android_os_Handler$dispatchMessage(Handler.java:99)
264642  at android.os.Handler.dispatchMessage(Handler.java)
264643  at org.robolectric.shadows.ShadowPausedLooper$IdlingRunnable.doRun(ShadowPausedLooper.java:573)
264644  at org.robolectric.shadows.ShadowPausedLooper$ControlRunnable.run(ShadowPausedLooper.java:536)
264645  at org.robolectric.shadows.ShadowPausedLooper.executeOnLooper(ShadowPausedLooper.java:629)
264646  at org.robolectric.shadows.ShadowPausedLooper.idle(ShadowPausedLooper.java:104)
264647  at org.robolectric.shadows.ShadowPausedLooper.idleIfPaused(ShadowPausedLooper.java:177)
264648  at org.robolectric.android.controller.ActivityController.visible(ActivityController.java:232)
264649  at com.ichi2.anki.RobolectricTest$Companion.startActivityNormallyOpenCollectionWithIntent(RobolectricTest.kt:281)
264650  at com.ichi2.anki.RobolectricTest.startActivityNormallyOpenCollectionWithIntent(RobolectricTest.kt)
264651  at com.ichi2.anki.ReviewerTest$Companion.startReviewer(ReviewerTest.kt:442)
264652  at com.ichi2.anki.ReviewerTest$Companion.startReviewer(ReviewerTest.kt:438)
264653  at com.ichi2.anki.ReviewerNoParamTest.startReviewer(ReviewerNoParamTest.kt:367)
264654  at com.ichi2.anki.ReviewerNoParamTest.normalReviewerFitsSystemWindows(ReviewerNoParamTest.kt:259)
criticalAY commented 2 weeks ago

I tried placing it in launchCatching{} block but it failed after few hits

david-allison commented 2 weeks ago

Blocked by DO NOT MERGE commit

mikehardy commented 2 weeks ago

Current state of this one is that I've reverted all the minor changes attempting to work around this:

All of these things, including getting the "backend already open" exception to throw as an Error if executing in a Robolectric context, and perhaps the actual root problem of non-serialized backend-open behavior on Windows can be solved, but not quickly

So this is in prep to revert.

Expectations:

If those expectations are met then I will make a clean revert PR

mikehardy commented 2 weeks ago

I just pushed the revert commit here before bed - prior to the revert we had 1-out-of-10 failures, then a 7-out-of-20 failure rate. So if it is 100% passing on windows unit tests I'd say that's pretty good. I would still prefer to see a couple batches-of-20 go through green if possible, by re-running if possible

But in general if anyone sees this in the next 12 hours or so and I'm not obviously active in here I won't mind a single bit if you just take charge and start noting results + re-running it and then ideally merging. Cheers

criticalAY commented 2 weeks ago

The revert commit was just the log one

lukstbit commented 2 weeks ago

With the full revert there were no workflow failure. I've started another run.

criticalAY commented 2 weeks ago

From discord: Revert fixed the issue but would like to know if we can pass col as param to getName? This would solve the issue of multiple backend opening attempts but in this the question would be how to get col in the activity or the place from where we are going to call it

mikehardy commented 2 weeks ago

Okay, 2nd run no failures. Thanks for the run while it was overnight here @lukstbit - the extra test results make this pretty statistically significant even for a possibly flaky test

Going to re-push with just the revert so this is merge-able