alteryx / woodwork

Woodwork is a Python library that provides robust methods for managing and communicating data typing information.
https://woodwork.alteryx.com
BSD 3-Clause "New" or "Revised" License
145 stars 20 forks source link

Add `TableSchema.diff` method to understand the difference between two TableSchema objects #1670

Open tamargrey opened 1 year ago

tamargrey commented 1 year ago

When passing around Woodwork dataframes, it is easy to lose track of some of the woodwork types, like feature origins or metadata, and because the table schema repr only shows column names, logical types, and semantic tags, it is hard to tell if other woodwork typing info has changed without going through all the relevant fields and directly comparing. It would be great if there was a Woodwork method to make this easier.

Code Example

schema_1.diff(schema_2)

We would need to come up with a design for what the output could be, but we could go as simple as just displaying all fields that are not equal and outputting the entire value, leaving it up to the user to determine what exactly is different. A more involved option would be to isolate the difference and display that specifically.

For consistency's sake, we should use this function to implement the TableSchema.__eq__ method, which will make sure that these two always stay in sync.

gsheni commented 1 year ago

@tamargrey What is the urgency of this issue? What is the benefit for EvalML?