OpsPAI / ADSketch

GNU General Public License v3.0
29 stars 3 forks source link

Min-Max Scaling in Online Adaptive Learning #1

Open cabinz opened 1 week ago

cabinz commented 1 week ago

Hi there,

First of all, thank you for sharing your work here! It's been incredibly insightful.

I have a question regarding the use of min-max scaling in the online adaptive learning stage of the algorithm.

In the paper, it is mentioned that "min-max scaling" is used instead of the "z-normalization" from the original MASS algorithm. I noticed that in the codebase, sklearn.preprocessing.MinMaxScaler is employed for this purpose. Specifically, I see the following line applying MinMaxScaler on online test data, where the scaler is fit on the original anomaly-free training samples:

# motif_operations.online_anomaly_detection()
_, online_scaled_test_metrics = scale_two_metrics(train_metric_values, online_test_metric_values)

However, in real-world scenarios, new data samples continuously stream in, and they are mostly labeled as new nominal points (i.e., anomaly-free, similar to the training data). My concern is:

I am relatively new to the field of anomaly detection, so I apologize if my question seems basic. I appreciate your time and any insights you can offer!

Thank you!

zbchern commented 1 week ago

Hi there,

Thank you for your kind words!

You raise a good question regarding the use of min-max scaling in the online adaptive learning stage.

Most of the time, the min-max boundaries derived from the offline training data should suffice, as normal data typically falls within or near these boundaries without deviating significantly.

However, if concept drift indeed occurs, the old boundaries could potentially degrade the performance. A simple yet effective solution would be to periodically update the min-max values based on a sliding window of recent data. This way, the scaler adapts to the most recent data distribution, reducing the impact of concept drift.

Additionally, there are studies discussing online normalization techniques.

I hope this helps.

Thanks!