flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
162.21k stars 26.64k forks source link

[infra] Many builds are stuck in queue despite having idle machines in the pool #147387

Open harryterkelsen opened 2 weeks ago

harryterkelsen commented 2 weeks ago

Type of Request

bug

Infrastructure Environment

Not completely sure, but seems like an issue with the Cocoon scheduler

What is happening?

Many builds are stuck in queue on commit 96d9cd1195917f66e7341f34eeb146d1a93a3767

Screenshot 2024-04-25 at 2 45 43 PM

Screenshot 2024-04-25 at 2 48 38 PM

Even though it looks like there are many idle machines in the pool:

Screenshot 2024-04-25 at 2 49 37 PM

Steps to reproduce

See the build is stuck https://flutter-dashboard.appspot.com/#/build

Expected results

Builds should be scheduled and tests run when there are idle machines

zanderso commented 2 weeks ago

Previous similar issue: https://github.com/flutter/flutter/issues/145939

zanderso commented 2 weeks ago

Over email @godofredoc has suggested rolling back today's dashboard deployment.

christopherfujino commented 2 weeks ago

Over email @godofredoc has suggested rolling back today's dashboard deployment.

I rolled back the cocoon version to one from April 23. Still waiting to verify if the new builds succeed.

zanderso commented 2 weeks ago

The tree is open, but due to some inconsistency in the task results database(s), the cocoon scheduler is batching jobs without backfilling for some tests. Since the tree is open, I am dropping this down to P1.

christopherfujino commented 2 weeks ago

Related to https://github.com/flutter/flutter/issues/142951

stuartmorgan commented 2 weeks ago

It looks like several builds that were started yesterday afternoon got into the inconsistent DB state again, so this isn't fully resolved. I'm not sure yet if there were two causes and the rollback only fixed one, or if the PRs that were rolled back were a red herring.