Explore the test data and brainstorm RTDIP component ideas - Githubissues

amosproj / amos2024ws01-rtdip-data-quality-checker

Easy access to high volume, historical and real time process data for analytics applications, engineers, and data scientists wherever they are.

Apache License 2.0

5 stars 0 forks source link

Explore the test data and brainstorm RTDIP component ideas #11

Open luccalb opened 1 week ago

luccalb commented 1 week ago

Explore the test data provided by shell and brainstorm ideas for RTDIP components that ensure better data quality or identify trends/anomalies

Timm638 commented 1 day ago

Some Brainstorming done with scitkit-learn as inspiration:

Dimensionality Reduction (Reduce redundant data, e. g. which sources correlate strongly with each other?)
Normalization of Data (By Z-Mean, Min-Max-Scaling, ...)
Other Preprocessing Methods: Map scalar data into bins, One-hot encoding
Trend Identification: Linear Regression, ARIMA

Other notes:

When we implement these functions, in which format should be work with the data? Convert everything into a pandas Dataframe and then back to the original format?