alan-turing-institute / learning-machines-drift

A Python package for monitoring dataset drift in secure environments
4 stars 2 forks source link

Add a class which filters logged data #6

Closed OscartGiles closed 1 year ago

OscartGiles commented 2 years ago

We have the following classes:

It would be good to have a new class that sits in the middle of Registry and HypothesisTest which allows you to query and filter data in the Registry:

Examples might include:

You may also want to do split-apply-combine operations, for example:

sgreenbury commented 1 year ago

Consider maintaining two copies of the dataset inside Monitor:

  1. loaded datasets from backend
  2. filtered datasets

For large datasets this will reduce the read time upon multiple uses of different filters.