Closed armgilles closed 3 years ago
Le learning du ML ne doit pas prendre en compte la station lorsque celle-ci est HS (status
= 0).
Certaines stations ont vraiment peu d'activité (peu de prise de vélo) :
PROFILE_STATION_RULE = {'high' : 36, # 6 heures
'medium' : 54, # 9 heures
'low' : 66, # 11 heures
}
qui permet de trouver la contamination (#23 ) de la station suivant son profile
from vcub_keeper.config import *
from vcub_keeper.reader.reader import *
from vcub_keeper.reader.reader_utils import filter_periode
from vcub_keeper.visualisation import *
from vcub_keeper.transform.features_factory import *
from vcub_keeper.ml.cluster import train_cluster_station, predict_anomalies_station
from vcub_keeper.ml.cluster_utils import load_model, export_model
# Lecture du fichier activité
ts_activity = read_time_serie_activity()
# Some features
ts_activity = get_transactions_in(ts_activity)
ts_activity = get_transactions_out(ts_activity)
ts_activity = get_transactions_all(ts_activity)
ts_activity = get_consecutive_no_transactions_out(ts_activity)
# Set an ID station
station_id = 109
For cluster learning by station :
clf = train_cluster_station(ts_activity, station_id=station_id)
# Export model
export_model(clf, station_id=station_id)
To predict anomalies :
clf = load_model(station_id=station_id)
station_pred = predict_anomalies_station(data=ts_activity, clf=clf, station_id=station_id)
# New column `anomaly ` : 1 is OK, -1 is an anomaly
Utilisation de Méta model par-dessus l'algo de détection d'anomalie (Isolation Forest) : https://scikit-lego.netlify.app/meta.html#OutlierClassifier
Idées :
ML
Features reduction :
Process :
Iso forest :