LLNL / merlin

Machine Learning for HPC Workflows
MIT License
118 stars 26 forks source link

bugfix/skewed-sample-hierarchy #450

Closed bgunnar5 closed 11 months ago

bgunnar5 commented 11 months ago

This branch has the patch for the skewed sample hierarchy bug. This bug would happen when a restart was needed in the add_merlin_expanded_chain_to_chord in the recursive section. If an exception was raised that triggered a restart here, then the path where recursion is started would not be reset correctly. This leads to skewed sample hierarchy paths. For example, 06/00/01 could become 06/06/00/01.

This bug also lead to extra samples being ran since these skewed hierarchies could create many new paths.

bgunnar5 commented 11 months ago

@doutriaux1 This one is tricky to add a test for as it's in the middle of a celery task. I could maybe add a kwarg specifically for testing to this task (e.g. test_mode=True) that will automatically trigger an exception and therefore a restart.

I think this will get easier to test when I update the test suite to use pytest/fixtures.

doutriaux1 commented 11 months ago

@doutriaux1 This one is tricky to add a test for as it's in the middle of a celery task. I could maybe add a kwarg specifically for testing to this task (e.g. test_mode=True) that will automatically trigger an exception and therefore a restart.

I think this will get easier to test when I update the test suite to use pytest/fixtures.

@bgunnar5 if you don't add a test now please make a note to add one to the PR where you will switch to pytest.