DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

The CWL integration tests fail when they land on the Toil gitlab runners and pass on the shared ones #4963

Closed adamnovak closed 3 months ago

adamnovak commented 3 months ago

It looks like the CWL integration tests time out when they run on the shared Gitlab runners, with the individual tests logging regularly that they are finishing over the 2.7 hours allocated but not actually getting through all of them. When they run on the shared runners, which I think have more cores, they succeed.

We should not have Toil runners with different specs than the shared runners.

I should also check #4888 to see if it actually makes the CWL tests run slower somehow and introduces this problem.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1586

adamnovak commented 3 months ago

I've paused the toil runners for now as a workaround for this. But I need to either fix or delete them or we'll be wasting Openstack resources having them around.

adamnovak commented 3 months ago

I've now destroyed the dedicated Toil runners, so we are running with just the shared runners. If CI seems too slow, we can make more shared runners.

I also rebooted one of the shared runners (number 1) that wasn't reporting in.