brainglobe / brainglobe-workflows

Workflows that utilise BrainGlobe tools to perform data analysis and visualisation.
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

Refactor tests #27

Closed sfmig closed 11 months ago

sfmig commented 1 year ago

A PR for refactoring the cellfinder workflow tests, following the feedback collected on issue #26.

Main features

They address the points from issue #26:

Other additions

Comments / questions

codecov[bot] commented 11 months ago

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (afba965) 80.85% compared to head (466667c) 81.31%. Report is 5 commits behind head on main.

Files Patch % Lines
...tegration/brainglobe_benchmarks/test_cellfinder.py 39.13% 14 Missing :warning:
brainglobe_benchmarks/cellfinder.py 0.00% 4 Missing :warning:
brainglobe_workflows/cellfinder.py 97.34% 3 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #27 +/- ## ========================================== + Coverage 80.85% 81.31% +0.45% ========================================== Files 28 32 +4 Lines 1494 1584 +90 ========================================== + Hits 1208 1288 +80 - Misses 286 296 +10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alessandrofelder commented 11 months ago

These two questions relate to our live discussion of where the workflow data should live, and to a further thought of mine that feels that a lot of the test code is still a bit complicated.

  • I think I went down a parametrisations 🐰 🕳️ , maybe some are redundant? Any feedback on this is welcome!

I think the parametrisations you've done are fine, but I'd say the if statement distinguishing two tests is a minor code smell. Maybe split the default case off into a separate test that doesn't require parametrisation?

  • The tests are a bit slow to run, mainly because often I download the data from GIN. Should I look into caching this? Right now I don't because I download the data to a pytest temp dir, which is different for each test run.

Yes, I think caching is worth it long-term to improve the development experience. If we move to using paths relative to $HOME as discussed, I would mock Path.home() as done in brainrender-napari instead of using tmp_path which would simplify both the test code and the caching..

The test code would then be something like:

def test_main(
):
    """Test main function for setting up and running cellfinder workflow (default case)
    """
    # mocking of $HOME in conftest.py takes care that test data and user data are separate
    cfg = main()

    # check output files exist
    assert (cfg.detected_cells_path).is_file()

@pytest.mark.parametrize(
    "input_config",
    [
        "input_config_fetch_GIN",
        "input_config_fetch_local",
    ],
)
def test_main_non_default(
):
    """Test main function for setting up and running cellfinder workflow
    Parameters
    ----------
    input_config : Optional[str]
        Path to input config json file
    """
    # mocking of $HOME in conftest.py takes care that test data and user data are separate
    cfg = main(str(Path.home()/".brainglobe/cellfinder/workflows"/input_config))

    # check output files exist
    assert (cfg.detected_cells_path).is_file()

This can also be part of a separate PR