getappmap / navie-benchmark

Navie benchmarks
MIT License
0 stars 0 forks source link

Accept TestStatus.ERROR for inverted test case if the marker error message is observed #57

Open kgilpin opened 1 month ago

kgilpin commented 1 month ago

In the following scenario, the test case is resolved as ERROR by the test harness.

As a result, the inverted patch is discarded, because the current logic only accepts FAILED test status.

Investigate why this test is determined to be an ERROR by the test harness. If ERROR is the correct status, consider accepting inverted patches that have status ERROR and have the marker error __BUG__HERE__.

Testing against Django installed in '/testbed/django'
Importing application auth_tests
Skipping setup of unused database(s): default, other.
System check identified no issues (0 silenced).
test_username_unique_constraint (auth_tests.test_auth_e003_inverted.UserWithUniqueConstraintTests) ... ERROR

======================================================================
ERROR: test_username_unique_constraint (auth_tests.test_auth_e003_inverted.UserWithUniqueConstraintTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/testbed/django/test/utils.py", line 382, in inner
    return func(*args, **kwargs)
  File "/testbed/tests/auth_tests/test_auth_e003_inverted.py", line 26, in test_username_unique_constraint
    raise Exception("__BUG__HERE__")
Exception: __BUG__HERE__

----------------------------------------------------------------------
Ran 1 test in 0.061s

FAILED (errors=1)

Test patch

diff --git a/tests/auth_tests/test_auth_e003_inverted.py b/tests/auth_tests/test_auth_e003_inverted.py
new file mode 100644
index 0000000..ffc985e
--- /dev/null
+++ b/tests/auth_tests/test_auth_e003_inverted.py
@@ -0,0 +1,26 @@
+from django.contrib.auth.checks import check_user_model
+from django.contrib.auth.models import AbstractBaseUser
+from django.core import checks
+from django.db import models
+from django.test import SimpleTestCase, override_settings, override_system_checks
+from django.test.utils import isolate_apps
+
+@isolate_apps('auth_tests', attr_name='apps')
+@override_system_checks([check_user_model])
+class UserWithUniqueConstraintTests(SimpleTestCase):
+    @override_settings(AUTH_USER_MODEL='auth_tests.UserWithUniqueConstraint')
+    def test_username_unique_constraint(self):
+        class UserWithUniqueConstraint(AbstractBaseUser):
+            username = models.CharField(max_length=30)
+            USERNAME_FIELD = 'username'
+
+            class Meta:
+                constraints = [
+                    models.UniqueConstraint(
+                        fields=['username'], name='user_username_unq'
+                    )
+                ]
+
+        errors = checks.run_checks(app_configs=self.apps.get_app_configs())
+        if errors:
+            raise Exception("__BUG__HERE__")
\ No newline at end of file
github-actions[bot] commented 1 month ago

Title

Accept TestStatus.ERROR for Inverted Test Case if the Marker Error Message __BUG__HERE__ is Observed

Problem

The test harness currently discards inverted patches that result in a TestStatus.ERROR even when the specific marker error message __BUG__HERE__ is observed. This is leading to potentially valid inverted patches being overlooked.

Analysis

The test in question deliberately raises an Exception with a marker error message __BUG__HERE__ to indicate the presence of a bug. During execution, the test harness identifies this as an ERROR and consequently, the inverted patch is discarded since the harness only accepts patches that result in FAILED status. To accept such patches, the system needs to identify and process the ERROR status containing the specific marker error message.

The distinction between FAILED and ERROR:

In this scenario, an ERROR due to the marker __BUG__HERE__ is intentional and should be considered a valid outcome for accepting the inverted patch.

Proposed Changes

  1. File: solver/workflow/execute_container.py

    • Update the logic in parse_test_status to check for the presence of the marker error message __BUG__HERE__ when the status is ERROR.
  2. File: solver/workflow/execute_container.py

    • Modify the logic that determines the acceptance of the test status to allow TestStatus.ERROR when the marker error message __BUG__HERE__ is observed.

Detailed Changes

  1. solver/workflow/execute_container.py: Modify parse_test_status function:

    • Check if the test output contains the marker error message __BUG__HERE__ when parsing the test status.
  2. solver/workflow/execute_container.py: Update status evaluation logic

    • Ensure that TestStatus.ERROR is accepted if the marker error message is found in the test output.

Example Updates:

  1. solver/workflow/execute_container.py: Modify parse_test_status function

    def parse_test_status(log, repo: str, test_output: Optional[str]) -> TestStatus:
       log_parser = MAP_REPO_TO_PARSER[repo]
       if not log_parser:
           raise ValueError(f"No log parser found for repo {repo}")
    
       test_status_dict: dict[str, str] = {}
       marker_error_present = False
       if test_output:
           try:
               parsed_status = log_parser(test_output)
               if parsed_status:
                   test_status_dict.update(parsed_status)
                   if "__BUG__HERE__" in test_output:
                       marker_error_present = True
           except Exception as e:
               log("parse-test-status", f"Failed to parse test status: {e}")
               log("parse-test-status", f"Test output: {test_output}")
    
       if test_status_dict:
           test_status_str = ", ".join(
               f"{test_name}: {status}" for test_name, status in test_status_dict.items()
           )
           log(
               "parse-test-status",
               f"Test status: {test_status_str}",
           )
    
       def any_status(status_name: str) -> bool:
           return any(
               status for status in test_status_dict.values() if status == status_name
           )
    
       if any_status(TestStatus.ERROR.value) and marker_error_present:
           test_status = TestStatus.ERROR
       elif any_status(TestStatus.ERROR.value):
           test_status = TestStatus.ERROR
       elif any_status(TestStatus.FAILED.value):
           test_status = TestStatus.FAILED
       else:
           test_status = TestStatus.PASSED
    
       log("parse-test-status", f"Overall test status: {test_status}")
       return test_status

Modify existing logic to account for the acceptance of TestStatus.ERROR when __BUG__HERE__ is detected.

By incorporating these changes, the test harness will be able to accept inverted patches that throw TestStatus.ERROR as long as the marker error message __BUG__HERE__ is observed. This will prevent valid patches from being discarded erroneously.

kgilpin commented 1 month ago

solve_logs.zip