CleanCut / green

Green is a clean, colorful, fast python test runner.
MIT License
795 stars 76 forks source link

Coverage error with green's DjangoRunner #227

Open MxFlix opened 4 years ago

MxFlix commented 4 years ago

When using coverage with green in django, all import statements, class heads and function heads (presumably etc.) are marked as not covered. I have no idea about the inner workings of the TestRunner, Django, Green and Coverage, but could it maybe be that coverage is "started" to late, and doesn't catch the initial import of these files? (As all those things are lines that are evaluated when the file is first read by Python)

It does work perfectly fine when using django's standard testRunner.

For example, here is the test.py:

from django.test import TestCase

import main.models

class MainTests(TestCase):
    def test_setting_and_retrieving_setting_works(self):
        """Setting and retrieving a setting work as expected.
        """
        setting_name = "some setting"
        setting_value = "a value"
        main.models.Settings.set_setting(setting_name, setting_value)
        self.assertEqual(
            setting_value, main.models.Settings.get_setting(setting_name)
        )

    def test_trying_to_get_an_unset_setting_returns_none(self):
        """When trying to get an unknown setting, None is returned.
        """
        self.assertIsNone(main.models.Settings.get_setting("Something"))

    def test_trying_to_get_an_unset_setting_returns_default(self):
        """When trying to get an unknown setting, None is returned.
        """
        self.assertEqual(
            "Default", main.models.Settings.get_setting("Something", "Default")
        )

Here is the main/models.py:

from django.db import models
from django.db.utils import OperationalError

class Settings(models.Model):
    """
    Stores settings as simple key/value pairs. These can then be used elsewhere in the app.
    """

    name = models.CharField(max_length=200, primary_key=True)
    value = models.CharField(max_length=200)

    @classmethod
    def get_setting(cls, name: str, default: str = None) -> str:
        """Retrieves a setting's value, or None if it doesn't exist."""
        try:
            return cls.objects.get(pk=name).value
        except (cls.DoesNotExist, OperationalError):
            return default

    @classmethod
    def set_setting(cls, name: str, value: str) -> None:
        """Sets the specified setting to the specified value, creates it if it doesn't exist."""
        cls.objects.update_or_create(name=name, defaults={"value": value})

For completeness sake, here is the .green file:

verbose         = 3
run-coverage    = True
cov-config-file = .coveragerc
processes = 1

And .coveragerc:

[run]
branch = True

[xml]
output = coverage.xml

And coverage's xml output:

<?xml version="1.0" ?>
<coverage version="5.2" timestamp="1594479177454" lines-valid="14" lines-covered="5" line-rate="0.3571" branches-valid="0" branches-covered="0" branch-rate="1" complexity="0">
    <!-- Generated by coverage.py: https://coverage.readthedocs.io -->
    <!-- Based on https://raw.githubusercontent.com/cobertura/web/master/htdocs/xml/coverage-04.dtd -->
    <sources>
        <source>/home/flix/PythonProjects/imagetagger</source>
    </sources>
    <packages>
        <package name="main" line-rate="0.3571" branch-rate="1" complexity="0">
            <classes>
                <class name="models.py" filename="main/models.py" complexity="0" line-rate="0.3571" branch-rate="1">
                    <methods/>
                    <lines>
                        <line number="1" hits="0"/>
                        <line number="2" hits="0"/>
                        <line number="5" hits="0"/>
                        <line number="10" hits="0"/>
                        <line number="11" hits="0"/>
                        <line number="13" hits="0"/>
                        <line number="14" hits="0"/>
                        <line number="16" hits="1"/>
                        <line number="17" hits="1"/>
                        <line number="18" hits="1"/>
                        <line number="19" hits="1"/>
                        <line number="21" hits="0"/>
                        <line number="22" hits="0"/>
                        <line number="24" hits="1"/>
                    </lines>
                </class>
            </classes>
        </package>
    </packages>
</coverage>
CleanCut commented 4 years ago

It does work perfectly fine when using django's standard testRunner.

Please try disabling green's coverage in your .green file and instead use the way django suggests to do coverage. Let me know if that works!

I'm going to close this in anticipation of it working, but if it doesn't please feel free to reopen this issue and let me know what actually happened.

macgeneral commented 4 years ago

Coverage doesn't seem to work reliably with Green and Django right now at all.

If I use the DjangoRunner with coverage enabled in the .green configuration and run my tests with python3 manage.py test myapp the scores don't add up. Green creates multiple .coverage.x_abcd files even though data_file = .coverage is configured in .coveragerc.

(I use coverage combine before creating the report, but it doesn't change the score)

In the end my score is 31 points (60% vs 91%) lower than when using the normal DiscoverRunner - and showing 0% coverage for some code paths I definitely know are covered, and < 50% coverage for some files I know are > 98% covered (not omitting any lines in them). clear-omit = True in the .green configuration doesn't help here either.

If I run it using the Django way (coverage run --source='.' manage.py test myapp) and disable the configuration parts in .green it respects my .coveragerc settings, but the final score is still 31 to even up to 57 points lower than when using the DiscoverRunner. It seems like the DjangoRunner breaks coverage's ability to track the executed code paths.

This looks like a great project and I would be really happy to use it, but unfortunately in the current state it's unusable to me. Especially the minimum-coverage option (and of course the great, clear output) looked promising.

Django: 2.2.17 LTS Coverage: 5.3 Green: 3.2.4 Python: 3.8.6

CleanCut commented 4 years ago

@macgeneral Unless coverage is enabled, green doesn't touch it: https://github.com/CleanCut/green/blob/639a0f41f1e46b4378a4f6e53caa47af20add694/green/process.py#L292-L311

Here's the possibilities I can see:

macgeneral commented 4 years ago

@CleanCut: Thank you for your reply. I'm sorry I should have described my problem in a more verbose / bug report manner.

I did disable coverage in the green configuration and the result is still as bad as when it's enabled:

dev@vm:~$ cat .green 
processes = 1
run-coverage = 0
verbose = 2
failfast = 1

The result I get is this:

dev@vm:~$ coverage run --source="./" manage.py test -v 2 --noinput --failfast
[..]
Ran 48 tests in 184.380s using 1 process

OK (passes=48)
Destroying test database for alias 'default' ('unittests')...
dev@vm:~$ coverage report -m --skip-covered | tail -n 3
TOTAL                                                    3788   2596    31%

42 files skipped due to complete coverage.

Compare it to a run using the RedGreenDiscoverRunner (or the regular DiscoverRunner):

dev@vm:~$ coverage run --source="./" manage.py test -v 2 --noinput --failfast
[..]
Ran 43 tests in 182.427s

OK
Destroying test database for alias 'default' ('unittests')...
dev@vm:~$ coverage report -m --skip-covered | tail -n 3
TOTAL                                                3788    388    90%

82 files skipped due to complete coverage.

Both tests were run after deleting the coverage results from the previous run, but it doesn't change anything:

dev@vm:~$ coverage erase

And green doesn't create any .coverage_abcd files during the test run. Just one .coverage file get's created by coverage itself.

The only difference I noticed are the 48 vs 43 tests, but that won't explain why many of my source files show close to zero coverage when I do know, that those code paths were executed.

macgeneral commented 4 years ago

Just to double check it isn't just me.

Regular test run

dev@vm:~$ python runtests.py wagtail.core
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
........................................................................................................................................................................................................................xx....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 862 tests in 26.691s

OK (expected failures=2)
Destroying test database for alias 'default'...

Coverage run

dev@vm:~$ coverage run --source="./" runtests.py wagtail.core                         
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
........................................................................................................................................................................................................................xx....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 862 tests in 38.739s

OK (expected failures=2)
Destroying test database for alias 'default'...

dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                                             Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                            25418  12622   6753    521    46%

148 files skipped due to complete coverage.

Coverage run with Green

dev@vm:~$ coverage run --source="./" runtests.py wagtail.core --testrunner=green.djangorunner.DjangoRunner
Creating test database for alias 'default'...
........................................................................................................................................................................................................................................................................................................................................................................................................................................xx....................................................................................................................................................................................................................................................................................................................................................................................................................................................

Ran 862 tests in 40.814s using 1 process

OK (expected_failures=2, passes=860)
Destroying test database for alias 'default'...

dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                                             Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                            25418  21547   6753     52    12%

92 files skipped due to complete coverage.

As you can see for otherwise equal test runs, with green used as test runner coverage only reports a fraction of the green-less test run.

Also in my previous edit: the numbers between green and coverage without green / regular run were even further apart as the tests run and errors differed by a large amount - in my smaller project I experienced test runner crashes with green as well when neither the DiscoverRunner nor RedGreenDiscoverRunner would fail on the same code base - I know this doesn't help much, but I assume it has something to do with your green.runner.run implementation.

I could also nearly double my coverage score by creating a child class of your DjangoRunner, copy and pasting the def run_tests(self, test_labels, extra_tests=None, **kwargs) function and adding some missing parts and adjusting the structure more to the Django DiscoverRunner.

I assume it's not possible for your implementation to overwrite run_suite instead (which would be less error prone). Sorry I can't track it down any further but I'm not too much into Test Runners to know what to look for (except for writing some smaller child classes to override some functions).

EDIT: switched test runs to only test wagtail.core for faster and more reliable comparisons

macgeneral commented 4 years ago

Here's my modifed run_tests:

def run_tests(self, test_labels, extra_tests=None, **kwargs):
    """
    Run the unit tests for all the test labels in the provided list.

    Test labels should be dotted Python paths to test modules, test
    classes, or test methods.

    A list of 'extra' tests may also be provided; these tests
    will be added to the test suite.

    Returns the number of tests that failed.
    """
    # Django setup / DiscoverRunner
    self.setup_test_environment()
    suite = self.build_suite(test_labels, extra_tests)
    databases = self.get_databases(suite)
    old_config = self.setup_databases(aliases=databases)
    run_failed = False

    # Green
    if type(test_labels) == tuple:
        test_labels = list(test_labels)
    else:
        raise ValueError("test_labels should be a tuple of strings")
    if not test_labels:
        test_labels = ["."]

    args = mergeConfig(Namespace())
    if self.verbose != -1:
        args.verbose = self.verbose
    args.targets = test_labels
    stream = GreenStream(sys.stdout)
    suite = self.loader.loadTargets(args.targets)
    if not suite:
        suite = GreenTestSuite()

    # DiscoverRunner with Green mixins
    try:
        # self.run_checks()  # fails with wagtail "TypeError: run_checks() missing 1 required positional argument: 'databases'"
        result = run(suite, stream, args)  # self.run_suite(suite)
    except Exception:
        run_failed = True
        raise
    finally:
        try:
            self.teardown_databases(old_config)
            self.teardown_test_environment()
        except Exception:
            # Silence teardown exceptions if an exception was raised during
            # runs to avoid shadowing it.
            if not run_failed:
                raise
    return self.suite_result(suite, result)

Wagtail Test

dev@vm:~$ coverage run --source="./" runtests.py wagtail.core --testrunner=.djangorunner.DjangoRunner
Creating test database for alias 'default'...
........................................................................................................................................................................................................................................................................................................................................................................................................................................xx....................................................................................................................................................................................................................................................................................................................................................................................................................................................

Ran 862 tests in 39.402s using 1 process

OK (expected_failures=2, passes=860)
Destroying test database for alias 'default'...

dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                                             Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                            25489  21558   6766     57    13%

92 files skipped due to complete coverage.

My Project

dev@vm:~$ coverage run --source="./" manage.py test -v 2 --noinput --failfast
[..]
Ran 48 tests in 201.419s using 1 process

OK (passes=48)
Destroying test database for alias 'default' ('unittests')...

dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                    Stmts   Miss  Cover   Missing
-------------------------------------------------------------------------------------
TOTAL                                                    3788   2175    43%

50 files skipped due to complete coverage.

If I don't comment self.run_checks() in my project, the coverage score goes up to 48% and with some tinkering with the run_suite method I got up to ~60% once, but I unfortunately stashed my changes.

EDIT: with the following modification I got wagtail.cores score up to 29%

[..]
        self.run_checks(databases)  # does not match the Django DiscoverRunner in 2.2, but latest dev version and wigtail uses Django 3.1
[..]
dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                                             Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                            25490  16668   6766    150    29%

132 files skipped due to complete coverage.

It still seems totally random to me why this happens.

macgeneral commented 4 years ago

Sorry for spamming so much, but just in case it helps you track down any errors:

class DjangoRunner(DiscoverRunner):
[..]
    def run_suite(self, suite, **kwargs):
        # Green
        args = mergeConfig(Namespace())
        if self.verbose != -1:
            args.verbose = self.verbose
        stream = GreenStream(sys.stdout)
        if not suite:
            suite = GreenTestSuite()

        return run(suite, stream, args)

    def run_tests(self, test_labels, extra_tests=None, **kwargs):
        """Try run_suite instead."""
        super(DjangoRunner, self).run_tests(test_labels, extra_tests=extra_tests, **kwargs)

doesn't seem to break anything obvious besides missing some parts (args.targets = test_labels and everything related to that):

dev@vm:~$ coverage report -m --skip-covered | { head -n 2; tail -n 3 }
Name                                                                             Stmts   Miss Branch BrPart  Cover   Missing
----------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                            25468  16660   6757    147    29%

132 files skipped due to complete coverage.

I don't know how green handles multi processing inside of run. I think if I recall correctly coverage and django's manage.py test --parallel option don't blend in well together either. And I'm not sure if it's possible for you to use run_suite instead of run_tests, because it barely seems to change compared to the original implementation while the run_tests function seems to change quite often.

macgeneral commented 4 years ago

One last spam message:

class DjangoRunner(DiscoverRunner):

    def __init__(self, verbose=-1, **kwargs):
        [..]
        self.test_labels = None

    [..]

    def build_suite(self, test_labels=None, extra_tests=None, **kwargs):
        self.test_labels = test_labels
        return super(DjangoRunner, self).build_suite(test_labels=test_labels, extra_tests=extra_tests, **kwargs)

    def run_suite(self, suite, **kwargs):
        """Use run_suite instead of run_tests."""
        test_labels = self.test_labels
        # Green DjangoRunner L.114-129
        if type(test_labels) == tuple:
            test_labels = list(test_labels)
        else:
            raise ValueError("test_labels should be a tuple of strings")
        if not test_labels:
            test_labels = ["."]

        args = mergeConfig(Namespace())
        if self.verbose != -1:
            args.verbose = self.verbose
        args.targets = test_labels
        stream = GreenStream(sys.stdout)
        suite = self.loader.loadTargets(args.targets)
        if not suite:
            suite = GreenTestSuite()
        return run(suite, stream, args)  # return result directly

Intercepts the test_labels without overriding too much of the Django source. You already set test_labels to ["."] like Django does in runner.py.

It doesn't directly solve the low coverage scores (just increases them to 48% compared to the current implementation's 31% for my project), but might save you some headache because you don't have to keep up with the run_tests function.

macgeneral commented 3 years ago

Looking at your code and the coverage documentation I found the solution:

dev@vm:~$ cat .coveragerc 
[run]
concurrency = multiprocessing
[..]

--or--

dev@vm:~$ cat pyproject.toml
[tool.coverage.run]
concurrency = ["multiprocessing"]
[..]

Because your runner.run function always uses a multiprocessing.Manager() coverage can't keep up even when it's just using one process because it defaults to thread.

As far as I can tell (for my project), my implementation above, the DiscoverRunner and your current DjangoRunner implementation all produce the exact same coverage scores (there's a small difference for the wagtail project). I would still advise against overriding the run_tests function but you probably have your reasons.

CleanCut commented 3 years ago

Reopening until we investigate and see if there's a way we can integrate the solution into green's DjangoRunner.