Closed chapmanjacobd closed 2 months ago
Hi @chapmanjacobd
pytest-regressions uses pytest-datadir
behind the scenes, and indeed it does create a separate temporary directory for each test run, so it is working as intended.
I'm closing this for now but feel free to follow up with more questions.
I'm fine with it creating one temp dir per test run (or even per test) but this is a bug specific to pytest.mark.parametrize
What do you mean? It should create a new temporary directory per test run (not per test mind you).
If you think this is not what is happening, perhaps it is worth posting an issue in pytest-datadir (but with a MWE using pytest-datadir only).
edit: I've verified that this only happens when using pytest.mark.parametrize and in this case pytest-regressions creates 11x more files than it does with normal tests. This matches the number of parameters:
To be clear, this is expected and working as intended: each parametrize parameter will create a separate "test run", and pytest-datadir will create a separate directory for each.
okay, you are right it is the same
import pytest
import os
def test_without_parameters1(data_regression):
data = {"key": "value_1"}
data_regression.check(data)
def test_without_parameters2(data_regression):
data = {"key": "value_2"}
data_regression.check(data)
def test_without_parameters3(data_regression):
data = {"key": "value_3"}
data_regression.check(data)
def test_without_parameters4(data_regression):
data = {"key": "value_4"}
data_regression.check(data)
def test_without_parameters5(data_regression):
data = {"key": "value_5"}
data_regression.check(data)
'''
tree /tmp/pytest-of-xk/
/tmp/pytest-of-xk/
├── pytest-0
│ ├── test_without_parameters10
│ │ └── params
│ │ └── test_without_parameters3.yml
│ ├── test_without_parameters1current -> /tmp/pytest-of-xk/pytest-0/test_without_parameters10
│ ├── test_without_parameters20
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters3.yml
│ │ └── test_without_parameters4.yml
│ ├── test_without_parameters2current -> /tmp/pytest-of-xk/pytest-0/test_without_parameters20
│ ├── test_without_parameters30
│ │ └── params
│ ├── test_without_parameters3current -> /tmp/pytest-of-xk/pytest-0/test_without_parameters30
│ ├── test_without_parameters40
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ └── test_without_parameters3.yml
│ ├── test_without_parameters4current -> /tmp/pytest-of-xk/pytest-0/test_without_parameters40
│ ├── test_without_parameters50
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.yml
│ │ └── test_without_parameters4.yml
│ └── test_without_parameters5current -> /tmp/pytest-of-xk/pytest-0/test_without_parameters50
├── pytest-1
│ ├── test_without_parameters10
│ │ └── params
│ │ ├── test_without_parameters1.obtained.yml
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.yml
│ │ ├── test_without_parameters4.yml
│ │ └── test_without_parameters5.yml
│ ├── test_without_parameters1current -> /tmp/pytest-of-xk/pytest-1/test_without_parameters10
│ ├── test_without_parameters20
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.obtained.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.yml
│ │ ├── test_without_parameters4.yml
│ │ └── test_without_parameters5.yml
│ ├── test_without_parameters2current -> /tmp/pytest-of-xk/pytest-1/test_without_parameters20
│ ├── test_without_parameters30
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.obtained.yml
│ │ ├── test_without_parameters3.yml
│ │ ├── test_without_parameters4.yml
│ │ └── test_without_parameters5.yml
│ ├── test_without_parameters3current -> /tmp/pytest-of-xk/pytest-1/test_without_parameters30
│ ├── test_without_parameters40
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.yml
│ │ ├── test_without_parameters4.obtained.yml
│ │ ├── test_without_parameters4.yml
│ │ └── test_without_parameters5.yml
│ ├── test_without_parameters4current -> /tmp/pytest-of-xk/pytest-1/test_without_parameters40
│ ├── test_without_parameters50
│ │ └── params
│ │ ├── test_without_parameters1.yml
│ │ ├── test_without_parameters2.yml
│ │ ├── test_without_parameters3.yml
│ │ ├── test_without_parameters4.yml
│ │ ├── test_without_parameters5.obtained.yml
│ │ └── test_without_parameters5.yml
│ └── test_without_parameters5current -> /tmp/pytest-of-xk/pytest-1/test_without_parameters50
└── pytest-current -> /tmp/pytest-of-xk/pytest-1
34 directories, 40 files
'''
vs
import pytest
import os
@pytest.mark.parametrize("f", range(0, 5))
def test_with_parameters(data_regression, f):
data = {"key": f"value_{f}"}
data_regression.check(data)
'''
tree /tmp/pytest-of-xk/
/tmp/pytest-of-xk/
├── pytest-0
│ ├── test_with_parameters_0_0
│ │ └── params
│ ├── test_with_parameters_0_current -> /tmp/pytest-of-xk/pytest-0/test_with_parameters_0_0
│ ├── test_with_parameters_1_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ └── test_with_parameters_3_.yml
│ ├── test_with_parameters_1_current -> /tmp/pytest-of-xk/pytest-0/test_with_parameters_1_0
│ ├── test_with_parameters_2_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ └── test_with_parameters_3_.yml
│ ├── test_with_parameters_2_current -> /tmp/pytest-of-xk/pytest-0/test_with_parameters_2_0
│ ├── test_with_parameters_3_0
│ │ └── params
│ │ └── test_with_parameters_0_.yml
│ ├── test_with_parameters_3_current -> /tmp/pytest-of-xk/pytest-0/test_with_parameters_3_0
│ ├── test_with_parameters_4_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ └── test_with_parameters_3_.yml
│ └── test_with_parameters_4_current -> /tmp/pytest-of-xk/pytest-0/test_with_parameters_4_0
├── pytest-1
│ ├── test_with_parameters_0_0
│ │ └── params
│ │ ├── test_with_parameters_0_.obtained.yml
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ ├── test_with_parameters_3_.yml
│ │ └── test_with_parameters_4_.yml
│ ├── test_with_parameters_0_current -> /tmp/pytest-of-xk/pytest-1/test_with_parameters_0_0
│ ├── test_with_parameters_1_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.obtained.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ ├── test_with_parameters_3_.yml
│ │ └── test_with_parameters_4_.yml
│ ├── test_with_parameters_1_current -> /tmp/pytest-of-xk/pytest-1/test_with_parameters_1_0
│ ├── test_with_parameters_2_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.obtained.yml
│ │ ├── test_with_parameters_2_.yml
│ │ ├── test_with_parameters_3_.yml
│ │ └── test_with_parameters_4_.yml
│ ├── test_with_parameters_2_current -> /tmp/pytest-of-xk/pytest-1/test_with_parameters_2_0
│ ├── test_with_parameters_3_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ ├── test_with_parameters_3_.obtained.yml
│ │ ├── test_with_parameters_3_.yml
│ │ └── test_with_parameters_4_.yml
│ ├── test_with_parameters_3_current -> /tmp/pytest-of-xk/pytest-1/test_with_parameters_3_0
│ ├── test_with_parameters_4_0
│ │ └── params
│ │ ├── test_with_parameters_0_.yml
│ │ ├── test_with_parameters_1_.yml
│ │ ├── test_with_parameters_2_.yml
│ │ ├── test_with_parameters_3_.yml
│ │ ├── test_with_parameters_4_.obtained.yml
│ │ └── test_with_parameters_4_.yml
│ └── test_with_parameters_4_current -> /tmp/pytest-of-xk/pytest-1/test_with_parameters_4_0
└── pytest-current -> /tmp/pytest-of-xk/pytest-1
34 directories, 40 files
'''
although it seems like there is a quadratic bug somewhere I will accept it as-is
thx
I dug a bit into it and it seems like saving everything at the function level reduces the number of duplicate files:
diff --git a/tests/conftest.py b/tests/conftest.py
index 694d7d5..81a844d 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -1 +1,11 @@
pytest_plugins = "pytester"
+
+import os
+from pathlib import Path
+
+import pytest
+
+
+@pytest.fixture(scope="function")
+def original_datadir(request) -> Path:
+ return Path(os.path.splitext(request.module.__file__)[0]) / request.function.__name__
After running tox
for pytest-regressions
it creates the same number of folders (or maybe tree
isn't counting additional nested folders idk) but the number of files in the temp folder goes from 1712 files
to 287 files
.
It might be good to also add request.node.callspec.params
in there somewhere:
@pytest.fixture
def original_datadir(request) -> Path:
data_dir = Path(os.path.splitext(request.module.__file__)[0])
data_dir /= request.function.__name__
if hasattr(request.node, 'callspec'):
data_dir /= ' '.join([f"{k}={v}" for k, v in request.node.callspec.params.items()])
return data_dir
For my own repo this cut down tempdir usage from 400MiB to 2MiB :o
Before: psub.txt
After: psub.txt
I'm not sure if the duplication would ever be useful so it might make sense to incorporate this somehow. I appreciate the existing documentation: https://pytest-regressions.readthedocs.io/en/latest/overview.html#data-directory-path maybe that suffices but the default seems a bit wasteful. The overhead of the existing behavior is pretty significant for modules with many tests but for smaller modules it doesn't make much difference. But for me, I'm okay now that I know how to use pytest-datadir
efficiently for my own usage
I really like this pytest plugin but after one run 55 tests create 3,027 folders/files (11MiB!)
rmlint says most of these are duplicates
Also, it is a bit weird that apparent size and actual disk usage differ so much. I guess that is because 4kb is the minimum size on my filesystem (3000*4kib ~ 12 MiB so that part makes sense... actually)
12 MiB seems fine but there seems to be an exponential or quadratic bug because it quickly became 3 GiB even though I had run pytest only a dozen or so times
I guess this is a bug/interaction with pytest.mark.parameterize?
edit: I've verified that this only happens when using
pytest.mark.parametrize
and in this case pytest-regressions creates 11x more files than it does with normal tests. This matches the number of parameters: