Some Brainstorming done with scitkit-learn as inspiration:
Dimensionality Reduction (Reduce redundant data, e. g. which sources correlate strongly with each other?)
Normalization of Data (By Z-Mean, Min-Max-Scaling, ...)
Other Preprocessing Methods: Map scalar data into bins, One-hot encoding
Trend Identification: Linear Regression, ARIMA
Other notes:
When we implement these functions, in which format should be work with the data? Convert everything into a pandas Dataframe and then back to the original format?
Explore the test data provided by shell and brainstorm ideas for RTDIP components that ensure better data quality or identify trends/anomalies