Open stianlagstad opened 1 year ago
There was a PR for a diff lately: https://github.com/LUMC/pytest-workflow/pull/175
We had a whole discussion about it. I am curious what your thoughts are.
Thanks for responding! I just read the discussion in https://github.com/LUMC/pytest-workflow/pull/175, and now I understand a bit more about the pros and cons. What I'm doing locally right now is using this helper function:
def diff_files(
file1: pathlib.Path,
file2: pathlib.Path,
ignore_lines_matching: Optional[str] = None,
) -> None:
if ignore_lines_matching is not None:
res = subprocess.Popen(
["diff", "-I", ignore_lines_matching, file1, file2], stdout=subprocess.PIPE
)
else:
res = subprocess.Popen(["diff", file1, file2], stdout=subprocess.PIPE)
output = res.communicate()[0]
if res.returncode != 0:
output_str = str(output, "UTF-8")
print(output_str)
raise Exception(
f"Found a difference between {file1=} and {file2=}, where none were"
" expected."
)
and using that in tests like this:
@pytest.mark.workflow("Test mymodule")
def test_mymodule_results_file_diff(workflow_dir: str) -> None:
# The file that we've produced in this test run:
result_file: pathlib.Path = pathlib.Path(
workflow_dir,
"results.vcf",
)
# The file stored as the golden truth:
stored_file: pathlib.Path = pathlib.Path(
"some_path/results.vcf",
)
# Do a diff between the files
diff_files(result_file, stored_file, ignore_lines_matching="fileDate")
Doing this is good enough for my purposes right now. It'll show the diff
output and fail the test if there's an unexpected difference there.
I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in #175 ).
Sorry for my late reply.
I'll keep this open if OK for you, and I'll keep pondering on the idea (as well as the discussion in https://github.com/LUMC/pytest-workflow/pull/175 ).
Yes, and thanks for thinking about that. Elegantly displaying non-matching files in a way that is useful is indeed a good addition, but a careful weighing of complexity vs the amount of use cases is also necessary.
Thank you very much for this valuable tool.
I'd like to propose to add new functionality to compare a file against a stored, known-to-be-correct file. The assertion could be named something like
must_match_file
. In many cases it's enough for me to usemd5sum
to verify whether or not a result file has changed at all. However, when I do see that something has changed it would be helpful to see what exactly changed in the output of the test failure. If an assertion likemust_match_file
existed, then I could be shown how the new result differs from the stored result. Kind of similar to "golden testing", ref https://ro-che.info/articles/2017-12-04-golden-tests.