CatimaLoyalty / Android

Catima, a Loyalty Card & Ticket Manager for Android
https://catima.app
GNU General Public License v3.0
869 stars 151 forks source link

Debug stuck unit test #1350

Open TheLastProject opened 1 year ago

TheLastProject commented 1 year ago

There's an unit test that keeps randomly getting stuck, causing a timeout after 6 hours. It's annoying. Can't reproduce locally, but happens in GitHub actions constantly.

obfusk commented 1 year ago

Is there any output generated that you can see?

obfusk commented 1 year ago

Anything that might give some kind of clue to what might be causing it?

obfusk commented 1 year ago

Or which specific test gets stuck?

TheLastProject commented 1 year ago
protect.card_locker.LoyaltyCardViewActivityTest > startWithoutParametersCaptureBarcodeCreateLoyaltyCard STARTED
##[debug]Re-evaluate condition on job cancellation for step: 'Run unit tests'.
Error: The operation was canceled.

https://github.com/CatimaLoyalty/Android/actions/runs/5169203344/jobs/9311309410

obfusk commented 1 year ago

I've been able to confirm it gets stuck in activityController.visible() in startWithoutParametersCaptureBarcodeCreateLoyaltyCard(). But I have no idea why. Or why only in that specific test.

obfusk commented 1 year ago

So... funny story. After replacing actions/setup-java with apt-get install openjdk-17-jdk-headless I have not yet been able to reproduce this bug on GitHub actions again. Which would explain being unable to reproduce locally.

diff --git a/.github/workflows/android.yml b/.github/workflows/android.yml
index 4a0acb02..1ed3caae 100644
--- a/.github/workflows/android.yml
+++ b/.github/workflows/android.yml
@@ -10,6 +10,9 @@ on:
     branches:
       - main

+env:
+  JAVA_HOME: /usr/lib/jvm/java-17-openjdk-amd64
+
 jobs:
   build:

@@ -20,11 +23,11 @@ jobs:
     - name: Fail on bad translations
       run: if grep -ri "<xliff" app/src/main/res/values*/strings.xml; then echo "Invalidly escaped translations found"; exit 1; fi
     - uses: gradle/wrapper-validation-action@v1
-    - name: set up JDK 17
-      uses: actions/setup-java@v2
-      with:
-        distribution: 'temurin'
-        java-version: '17'
+    - name: set up OpenJDK 17
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y openjdk-17-jdk-headless
+        sudo update-alternatives --auto java
     - name: Build
       run: ./gradlew assembleRelease
     - name: Check lint
obfusk commented 1 year ago

I have not yet been able to reproduce this bug on GitHub actions again.

Sadly, it got stuck again. But after >20 runs now. Whereas before it was about 1/7 runs.

obfusk commented 1 year ago

Looks like https://github.com/CatimaLoyalty/Android/actions/runs/5229783997/jobs/9442968697 got stuck twice and the retry after the timeout failed with:

2023-06-10T12:06:55.6116209Z protect.card_locker.LoyaltyCardViewActivityTest > startWithoutParametersCaptureBarcodeCreateLoyaltyCard STARTED
2023-06-10T14:35:17.0133272Z 
2023-06-10T14:35:17.0134017Z protect.card_locker.LoyaltyCardViewActivityTest > startWithoutParametersCaptureBarcodeCreateLoyaltyCard FAILED
2023-06-10T14:35:17.0134687Z     java.lang.OutOfMemoryError at Arrays.java:3657
2023-06-10T14:35:17.0134963Z 
2023-06-10T14:35:17.0135292Z protect.card_locker.LoyaltyCardViewActivityTest > startWithMissingLoyaltyCard STARTED
2023-06-10T14:35:17.0135659Z 
2023-06-10T14:35:17.0136019Z protect.card_locker.LoyaltyCardViewActivityTest > startWithMissingLoyaltyCard FAILED
2023-06-10T14:35:17.0136868Z     java.lang.OutOfMemoryError at Provider.java:470
obfusk commented 1 year ago

I guess maybe we should timeout the second time too? And/or kill gradle before retrying?

obfusk commented 1 year ago

I finally had it happen locally, once. CPU fan was blowing hard. So some kind of busy loop or something. Sadly no way to debug when it's this infrequent.