getappmap / navie-benchmark

Navie benchmarks
MIT License
0 stars 0 forks source link

Claude produces mangled change output #94

Open kgilpin opened 2 weeks ago

kgilpin commented 2 weeks ago

https://github.com/getappmap/navie-benchmark/actions/runs/11614957222/job/32344372220

This is a surprisingly common error, in which the closing </modified> tag is mis-emitted as an </original> tag.

We should fix these up automatically when this condition is detected.

Failed to parse change: <change>
<file change-number-for-this-file="4">/home/runner/work/navie-benchmark/navie-benchmark/solve/django__django-15503/source/django/db/models/fields/json.py</file>
<original line-count="14" no-ellipsis="true"><![CDATA[    def as_sql(self, compiler, connection, template=None):
        # Process JSON path from the left-hand side.
        if isinstance(self.lhs, KeyTransform):
            lhs, lhs_params, lhs_key_transforms = self.lhs.preprocess_lhs(
                compiler, connection
            )
            lhs_json_path = compile_json_path(lhs_key_transforms)
        else:
            lhs, lhs_params = self.process_lhs(compiler, connection)
            lhs_json_path = "$"
        sql = template % lhs
        # Process JSON path from the right-hand side.
        rhs = self.rhs
        rhs_params = []
        if not isinstance(rhs, (list, tuple)):]]></original>
<modified line-count="14" no-ellipsis="true"><![CDATA[    def as_sql(self, compiler, connection, template=None):
        if connection.vendor == 'postgresql':
            return super().as_sql(compiler, connection)
        if template is None:
            if connection.vendor == 'mysql':
                template = "JSON_CONTAINS_PATH(%s, 'one', %%s)"
            elif connection.vendor == 'oracle':
                template = "JSON_EXISTS(%s, '%%s')"
            else:  # sqlite
                template = "JSON_TYPE(%s, %%s) IS NOT NULL"
        lhs, rhs_params = self.process_json_path_lookup(compiler, connection)
        sql = template % lhs
        if self.logical_operator:
            sql = "(%s)" % self.logical_operator.join([sql] * len(rhs_params))]]></original>
</change>
github-actions[bot] commented 2 weeks ago

Title: Automatically Correct Mis-Emitted XML Tags in Claude's Change Output

Problem: Claude frequently produces errors in the XML output of code changes, specifically by mis-emitting the closing tag </modified> as </original>. This causes the parser to fail when encountering the change, as the expected XML structure is violated. These errors particularly occur in change sections that use <original> and <modified> tags to denote changes in code files. The issue needs automatic correction when detected.

Analysis: The root cause of this problem is likely a bug in the process that transforms and writes the change information into XML format, which mistakenly uses </original> instead of </modified> to close a modified section. XML parsers rely heavily on correctly paired tags to understand and navigate documents, and such mispairings lead to parsing errors. Automated recognition and correction of this issue would involve verifying the paired nature of these tags when a change section is built or outputted and automatically substituting </original> with </modified> where necessary. This can be accomplished by post-processing the XML output to identify and correct incorrectly paired tags.

Proposed Changes:

  1. Change in Output Generation Logic:

    • Review and modify the logic responsible for generating the XML structure of changes. Ensure that the mechanism that closes tags adheres strictly to the opening tags for correctness.
  2. Post-Processing Functionality:

    • Implement a post-processing step in the code routine that handles Claude’s change outputs. This step should:
      • Scan for sections of the change document where the </original> tag follows a <modified> opening tag.
      • Replace such mis-emitted </original> tags with </modified> to correct the structure.
  3. Integration and Testing:

    • Integrate this fix into the workflow that produces and utilizes Claude's outputs. Ensure that corrected outputs do not carry over improperly structured XML.
    • Develop tests to simulate scenarios where the </original> tag is wrongly placed. These tests should validate that the error is detected and corrected appropriately.

By addressing the tag mismatch issue through both preventive measures in output generation and corrective measures in post-processing, the fault in the output XML can be systematically resolved.