adaptive-machine-learning / CapyMOA

Enhanced machine learning library tailored for data streams, featuring a Python API integrated with MOA backend support. This unique combination empowers users to leverage a wide array of existing algorithms efficiently while fostering the development of new methodologies in both Python and Java.
BSD 3-Clause "New" or "Revised" License
62 stars 22 forks source link

Correct and simplify plotting function #116

Open hmgomes opened 5 months ago

hmgomes commented 5 months ago

plot_predictions_vs_ground_truth only works with stream_from_file() when a CSV is used as below. If we use an ARFF or the datasets functionality (which downloads an ARFF version of the data) we face an error that most likely originates on the prequential_evaluation function and becomes evident on the plot_predictions_vs_ground_truth function. Once this is corrected we will need to update the tutorial 01_evaluation.ipynb

from capymoa.evaluation import prequential_evaluation
from capymoa.evaluation.visualization import plot_predictions_vs_ground_truth
from capymoa.regressor import KNNRegressor, AdaptiveRandomForestRegressor
from capymoa.stream import stream_from_file

stream = stream_from_file(path_to_csv_or_arff="../data/fried.csv", enforce_regression=True)
kNN_learner = KNNRegressor(schema=stream.get_schema(), k=5)
ARF_learner = AdaptiveRandomForestRegressor(schema=stream.get_schema(), ensemble_size=10)

# When we specify store_predictions and store_y, the results will also include all the predictions and all the ground truth y. 
# It is useful for debugging and outputting the predictions elsewhere. 
kNN_results = prequential_evaluation(stream=stream, learner=kNN_learner, window_size=5000, store_predictions=True, store_y=True)
# We don't need to store the ground-truth for every experiment, since it is always the same for the same stream
ARF_results = prequential_evaluation(stream=stream, learner=ARF_learner, window_size=5000, store_predictions=True)

# Plot only 200 predictions (see plot_interval)
plot_predictions_vs_ground_truth(kNN_results, ARF_results, ground_truth=kNN_results['ground_truth_y'], plot_interval=(0, 200))
hmgomes commented 1 month ago

Hi @YibinSun , I believe this might have been solved. Can you please double check so we can close this issue?