arundo / adtk

A Python toolkit for rule-based/unsupervised anomaly detection in time series
https://adtk.readthedocs.io
Mozilla Public License 2.0
1.09k stars 145 forks source link

Optimized the logic applying univariate model to DataFrame #67

Closed tailaiw closed 4 years ago

tailaiw commented 4 years ago

This PR fixed an issue mentioned in #65

When a univariate model is applied to DataFrame, it follows the following logics which were not clearly implemented:

  1. If the model is trained by a Series and is then applied to a DataFrame, it is applied to every column independently;
  2. If the model is trained by a DataFrame, every column is trained independently. Internally, the model is replicated into n separated models. They are trained by the n columns respectively. 2.1. If the model is applied to a Series, an error will be thrown with a message that this is not allowed; 2.2. If the model is applied to a DataFrame, the n trained internal models are applied to the corresponding columns respectively. The matching is done by column names. Therefore, the input DataFrame must have the same set of column names as the DataFrame used for training. Otherwise, an error will be thrown with a message listing inconsistent column names.