Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.47k stars 2.82k forks source link

Investigate workflow job failing on main: e2ePerformanceTests / Run E2E tests in AWS device farm #48824

Open github-actions[bot] opened 1 month ago

github-actions[bot] commented 1 month ago

🚨 Failure Summary 🚨:

⚠️ Action Required ⚠️:

🛠️ A recent merge appears to have caused a failure in the job named e2ePerformanceTests / Run E2E tests in AWS device farm. This issue has been automatically created and labeled with Workflow Failure for investigation.

👀 Please look into the following:

  1. Why the PR caused the job to fail?
  2. Address any underlying issues.

🐛 We appreciate your help in squashing this bug!

Issue OwnerCurrent Issue Owner: @kirillzyusko
dangrous commented 1 month ago

sending to the experts! https://expensify.slack.com/archives/C035J5C9FAP/p1726004032791809

dangrous commented 1 month ago

Investigation in process!

dangrous commented 1 month ago

Working on getting the logs. It's not related to the linked PR, but keeping this open as a daily for that investigation

melvin-bot[bot] commented 3 weeks ago

@dangrous Whoops! This issue is 2 days overdue. Let's get this updated quick!

dangrous commented 3 weeks ago

margelo team is on it I believe, in that same slack thread. @kirillzyusko let me know if you want me to assign you here!

kirillzyusko commented 3 weeks ago

@dangrous yeah, feel free to assign me on this!

melvin-bot[bot] commented 2 weeks ago

@dangrous, @kirillzyusko Eep! 4 days overdue now. Issues have feelings too...

kirillzyusko commented 2 weeks ago

It failed because of timeout issue (we hit a limit of 5400s) - 1.5h.

I think we merged a PR https://github.com/Expensify/App/pull/47777 which increases it to 7200 (2h). Do you think we can close the issue?

dangrous commented 2 weeks ago

It looks from the screengrab that it crashed though, right? And that's what caused the timeout since the app never reopened? We should see if we can figure out what that crash was....

kirillzyusko commented 2 weeks ago

@dangrous yeah, you are right, but from my observation:

In fact in out e2e tests we allow test to crash 3 times during its 60 runs. And we are relying on this fact. The problem is that when test crashes, then we are waiting 5 mins to force quit it (we have 5 mins timeout for a test). And if we get 2 random failures in any test, it will result in 10 minutes overhead for 1 test-suite. We have 5 test suites, so potentially retrying mechanism can add ~50 minutes for our test run 🤷‍♂️ And I think because of that we hit a limit in this particular test.

One of the things to optimize it I've been thinking of is reducing the timeout interval (from 5 minutes to 2.5 minutes). But I think we need to ask @hannojg why such relatively big timeout was chosen for e2e tests?

dangrous commented 1 week ago

oh okay that makes sense - yeah I feel like we could even go shorter than 2.5 mins - I feel like if something is hanging for more than, say, 1 minute, then something is wrong enough that we should look at it. But curious what @hannojg thinks. Or if he's still OOO I think we can close this in the meantime

hannojg commented 1 week ago

Agree, we can definitely make this timeout interval shorter!

dangrous commented 1 week ago

Great! @kirillzyusko do you want to put up a PR to drop that timeout, maybe start with 2.5 mins and we see how that one goes? Probably could go even shorter but maybe that's a good starting point

hannojg commented 1 week ago

Kiryl is OOO, and will be back next week to pick this one up!

melvin-bot[bot] commented 1 week ago

@dangrous, @kirillzyusko Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

melvin-bot[bot] commented 6 days ago

@dangrous, @kirillzyusko 6 days overdue. This is scarier than being forced to listen to Vogon poetry!

dangrous commented 4 days ago

@kirillzyusko let us know when you're back and can knock out the timeout adjustment!

kirillzyusko commented 4 days ago

@dangrous here is a PR: https://github.com/Expensify/App/pull/50512 👀