MrPowers / chispa

PySpark test helper methods with beautiful error messages
https://mrpowers.github.io/chispa/
MIT License
595 stars 65 forks source link

chispa 1.0 release #93

Open MrPowers opened 7 months ago

MrPowers commented 7 months ago

It would be nice to develop chispa so we can make a 1.0 release.

We might even want to expose a different interface. Something like this:

@dataclass
class MyFormats:
    mismatched_rows = ["light_yellow"]
    matched_rows = ["cyan", "bold"]
    mismatched_cells = ["purple"]
    matched_cells = ["blue"]

my_chispa = Chispa(formats=MyFormats())

my_chispa.assert_df_equality(actual_df, expected_df)

The user could inject the my_chispa object in their tests as follows:

@pytest.fixture()
def my_chispa():
    return Chispa(formats=MyFormats())

def test_shows_assert_basic_rows_equality(my_chispa):
  ...
  my_chispa.assert_basic_rows_equality(df1.collect(), df2.collect())

It's worth contemplating at least.

MrPowers commented 2 months ago

Let's brainstorm some of the "big issues" with chispa:

Here are some project goals:

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

SemyonSinchenko commented 2 months ago

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

Why not to have a new API, but do not delete an old one, only raise DeprecationWarnings? Or even just create a chispa.v2 API.

MrPowers commented 2 months ago

Yep, I already started building that new interface with Chispa(formats=MyFormats()). We may want to expose the public API via Chispa going forward. I think we just need to figure out exactly the public interface that we want to expose to end users. The public interface should meet all the project goals, should be flexible enough to allow for customizations, and should be easy to run with the defaults.

fpgmaas commented 2 months ago

user can't customize formatting

I already started building that new interface with Chispa(formats=MyFormats()). [...]

@MrPowers For a proposed new way of formatting configuration, see https://github.com/MrPowers/chispa/pull/127 which would change that for users to e.g.

Chispa(
    formats=FormattingConfig(
        mismatched_rows={"color": "light_yellow"}
    )
)
fpgmaas commented 2 months ago

I think the best way to move forward is to simply create separate issues for the following topics:

bad for wide table DataFrame comparisons doesn't handle some column types well probably doesn't handle some edge cases well (e.g. array columns with NaN values) user can't customize formatting some bad abstractions (e.g the underline_cells argument) Users can't disable terminal characters (sometimes users want to use this in a notebook and don't want any Terminal formatting output)

So we can discuss them separately. We add them to the milestone for a 1.0 release. We release features and changes one-by-one by incrementing the minor version, and when all desired changes and features for the 1.0 release are finished, we release it.