adaptive-machine-learning / CapyMOA

Enhanced machine learning library tailored for data streams, featuring a Python API integrated with MOA backend support. This unique combination empowers users to leverage a wide array of existing algorithms efficiently while fostering the development of new methodologies in both Python and Java.
BSD 3-Clause "New" or "Revised" License
60 stars 20 forks source link

fix: ensuring deterministic behavior for driftstream #179

Closed hmgomes closed 1 week ago

hmgomes commented 1 month ago

Previously DriftStream was reusing its internal MOA ConceptDriftStream object which led to divergences with a ConceptDriftStream generated from MOA or using the MOA CLI instead of the high-level DriftStream API (i.e. defining it with a list of Concepts and Drifts.

## Comparing that instances are the same
from capymoa.stream.drift import DriftStream, Drift, AbruptDrift, GradualDrift
from capymoa.stream.generator import SEA
from capymoa.classifier import HoeffdingTree
from moa.streams import ConceptDriftStream
from capymoa.evaluation import prequential_evaluation
import numpy as np

def _different(x1: np.ndarray, x2: np.ndarray, debug=False):
    if debug and not np.array_equal(x1, x2):
        print("Instance from stream_sea2drift (inst_capy.x):", x1)
        print("Instance from stream_sea2drift_MOA (inst_moa.x):", x2)
        # Optionally, you can add a more detailed comparison output
        diff_indices = np.where(x1 != x2)
        print("Differences at indices:", diff_indices)
        print("Values in inst_capy.x:", x1[diff_indices])
        print("Values in inst_moa.x:", x2[diff_indices])

    return not np.array_equal(x1, x2)

stream_sea2drift = DriftStream(stream=[SEA(function=1), 
                                AbruptDrift(position=50), 
                                SEA(function=3), 
                                GradualDrift(position=100, width=20), 
                                SEA(function=2)])

print(f'~~~~~~ DriftStream is accessible through the object ~~~~~~:\n {stream_sea2drift}')

stream_sea2drift_MOA = DriftStream(moa_stream=ConceptDriftStream(), 
                               CLI='-s (ConceptDriftStream -s generators.SEAGenerator -d (generators.SEAGenerator -f 3) -p 50 -w 0) \
                               -d (generators.SEAGenerator -f 2) -w 20 -p 100 -r 1 -a 0.0')

i = 1
while i < 150:
    inst_capy = stream_sea2drift.next_instance()
    inst_moa = stream_sea2drift_MOA.next_instance()

    if _different(inst_capy.x, inst_moa.x, debug=False):
        print(f"Error: Instances do not match. num_instance: {i}")
        raise ValueError("Execution stopped due to mismatch in instances.")
    else:
        print(f"results match. num_instance: {i}")

    i += 1
hmgomes commented 1 month ago

Might as well use that example code as the basis for an automated test. Still need to investigate the behaviour when a DriftStream object is restarted. For example, using it several times with prequential_evaluation will cause it to be restarted.