Drifted Tabular Data Generation

Hi there,

I've been exploring data drift detection and have been wanting to test how good evidently is at determining how much a given dataset has drifted. However, my main concern right now is wondering how to generate drifted data in the first place, and how much to skew them, so that I can get evidently to detect how much drift was applied to them.

So let's say I have a tabular dataframe like this, where I want to drift just the feature of Age.

adult df

What are the types of ways to artificially create a drifted dataset from a given dataset?

What I've been doing is splitting it into 2 extreme ranges (e.g. one set of <50 Age and one set of >=50 Age), and then mixing the two datasets more and more to create "less" drift. But supposedly for tabular data would something simpler do the trick, such as applying a uniform difference to all the Ages of one dataset work? Applying a random noise to all of the Ages, the noise following some normal distribution? What other standard techniques could be used to apply drift in this manner, and of a degree that can be varied?

Thank you!

evidentlyai / evidently

Drifted Tabular Data Generation #328