Feature request: Add something like `must_match_file` in addition to `md5sum`

stianlagstad commented 1 year ago

Thank you very much for this valuable tool.

I'd like to propose to add new functionality to compare a file against a stored, known-to-be-correct file. The assertion could be named something like must_match_file. In many cases it's enough for me to use md5sum to verify whether or not a result file has changed at all. However, when I do see that something has changed it would be helpful to see what exactly changed in the output of the test failure. If an assertion like must_match_file existed, then I could be shown how the new result differs from the stored result. Kind of similar to "golden testing", ref https://ro-che.info/articles/2017-12-04-golden-tests.

rhpvorderman commented 1 year ago

There was a PR for a diff lately: https://github.com/LUMC/pytest-workflow/pull/175

We had a whole discussion about it. I am curious what your thoughts are.

stianlagstad commented 1 year ago

Thanks for responding! I just read the discussion in https://github.com/LUMC/pytest-workflow/pull/175, and now I understand a bit more about the pros and cons. What I'm doing locally right now is using this helper function:

def diff_files(
    file1: pathlib.Path,
    file2: pathlib.Path,
    ignore_lines_matching: Optional[str] = None,
) -> None:
    if ignore_lines_matching is not None:
        res = subprocess.Popen(
            ["diff", "-I", ignore_lines_matching, file1, file2], stdout=subprocess.PIPE
        )
    else:
        res = subprocess.Popen(["diff", file1, file2], stdout=subprocess.PIPE)
    output = res.communicate()[0]
    if res.returncode != 0:
        output_str = str(output, "UTF-8")
        print(output_str)
        raise Exception(
            f"Found a difference between {file1=} and {file2=}, where none were"
            " expected."
        )

and using that in tests like this:

@pytest.mark.workflow("Test mymodule")
def test_mymodule_results_file_diff(workflow_dir: str) -> None:
    # The file that we've produced in this test run:
    result_file: pathlib.Path = pathlib.Path(
        workflow_dir,
        "results.vcf",
    )
    # The file stored as the golden truth:
    stored_file: pathlib.Path = pathlib.Path(
        "some_path/results.vcf",
    )

    # Do a diff between the files
    diff_files(result_file, stored_file, ignore_lines_matching="fileDate")

Doing this is good enough for my purposes right now. It'll show the diff output and fail the test if there's an unexpected difference there.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).

rhpvorderman commented 11 months ago

Sorry for my late reply.

I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in https://github.com/LUMC/pytest-workflow/pull/175 ).

Yes, and thanks for thinking about that. Elegantly displaying non-matching files in a way that is useful is indeed a good addition, but a careful weighing of complexity vs the amount of use cases is also necessary.

LUMC / pytest-workflow

Feature request: Add something like `must_match_file` in addition to `md5sum` #179