david-thrower / cerebros-core-algorithm-alpha

The Cerebros package is an ultra-precise Neural Architecture Search (NAS) / AutoML that is intended to much more closely mimic biological neurons than conventional neural network architecture strategies.
Other
27 stars 4 forks source link

bug workflows appear to time out or memory leak #82

Closed david-thrower closed 1 year ago

david-thrower commented 1 year ago

Kind of issue: bug

Describe the issue: Test workflows are failing on the task that tests the CIFAR10 NAS example, which was successfully tested numerous times on the same runner. This appears to be probably due to a time out condition. When they fail, the lonly error appearing in the logs is "##[error]The operation was canceled.". This is usually caused either by a memory leak or a time out condition. Since a sub- task is failing after several successful tasks on the same workflow, many which involve much higher memory pressure (one involving a BERT text embedding), it is unlikely that a memory exhaustion is the causal, by elimination, this is probably a timeout condition.

Steps to reproduce the behavior:

Run the automerge.yaml workflow (merge a branch into main).

If not on Kubeflow, where were you running this? Ubuntu 22.04 LTS Github Actions runner

Expected behavior

Workflow should have either completed or returned an exception explaining why it did not complete.

Additional context Add any other context about the problem here. https://github.com/actions/runner/issues/2323