apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.31k stars 3.48k forks source link

[C++/Python] Provide assertion helpers in the style of pandas.testing.assert_frame_equal #19042

Open asfimport opened 6 years ago

asfimport commented 6 years ago

Pandas provides helper functions for writing tests based on its data structures that output an explanative error message on assertion failures. In constrast, Arrow at the moment only supports boolean equals comparisons. We should add helper functions that also give a very detailed response on where the comparison failed.

Reporter: Uwe Korn / @xhochy

Note: This issue was originally created as ARROW-2647. Please see the migration documentation for further details.

asfimport commented 4 years ago

Joris Van den Bossche / @jorisvandenbossche: I also regularly run into this that a assert result.equals(expected) does not give much information about why the objects are different, so +1 on adding something like this.

asfimport commented 4 years ago

Joris Van den Bossche / @jorisvandenbossche: For a failing test I am having right now, I just wrote:


def assert_table_equal(left, right, check_metadata=False):
    if left.equals(right, check_metadata=check_metadata):
        return

    if not left.schema.equals(right.schema):
        raise AssertionError(
            "Schema not equal\nLeft:\n{0}\nRight:\n{1}".format(
                left.schema, right.schema
            )
        )

    if check_metadata:
        if not left.schema.equals(right.schema, check_metadata=True):
            if not left.schema.metadata == right.schema.metadata:
                raise AssertionError(
                    "Metadata not equal\nLeft:\n{0}\nRight:\n{1}".format(
                        left.schema.metadata, right.schema.metadata
                    )
                )
        # TODO also check field metadata
        for col in left.schema.names:
            assert left.schema.field(col).equals(
                right.schema.field(col), check_metadata=True
            )

    for col in left.column_names:
        a_left = pa.concat_arrays(left.column(col).chunks)
        a_right = pa.concat_arrays(right.column(col).chunks)
        if not a_left.equals(a_right):
            raise AssertionError(
                "Column '{0}' not equal:\n{1}".format(col, a_left.diff(a_right))
            )

    raise AssertionError("Tables not equal for unknown reason")