M2 - A3 - Extract relevant features from the collected data that can impact solar panel performance, such as solar irradiance, temperature, humidity, panel orientation, and tilt angle #3
Objective:
Undertake feature engineering on the dataset following the Exploratory Data Analysis (EDA) of power generation and weather dynamics. The goal is to craft advanced features that reveal intricate relationships and dependencies, enhancing the predictive capacity of models. By refining the dataset with carefully curated variables, this approach aims to provide a nuanced understanding of the complex interplay between power output and environmental factors, setting the stage for highly accurate machine learning models tailored for solar power generation forecasting.
Tasks:
Data Loading:
Load the historical power generation and weather data preprocessed in the previous notebook.
Temporal Feature Engineering:
Introduce additional features derived from the timestamp to capture temporal patterns:
Visualize temporal patterns using boxplots to understand hourly and monthly distribution of power generation.
One-Hot Encoding:
Use one-hot encoding to represent categorical variables (season, month, hour) for better compatibility with machine learning algorithms.
Lag-Based Feature Generation:
Create lag-based features to incorporate historical information at varying intervals (1 hour, 2 hours, 4 hours, 24 hours, 30 days).
Rolling Window Features for Temporal Trends:
Generate rolling window-based features (mean, standard deviation, maximum, exponentially weighted moving average) for target and environmental variables over different time windows (24 hours, 48 hours, 30 days).
Additional Contextual Features:
Introduce contextual features such as hours since last rain, days since installation, wind chill index, and solar zenith angle to provide further insights into environmental conditions.
Identifying and Handling Outliers:
Apply heuristic-based methods and statistical approaches to identify and remove outliers:
No power with radiation heuristic.
Low voltage with non-zero device temperature heuristic.
Statistical outlier detection using Interquartile Range (IQR) method.
Visualize distribution of power generation before and after outlier removal for assessment.
Save Feature-Engineered Dataset:
Save the feature-engineered dataset for use in subsequent stages of the pipeline.
Conclusion:
Summarize the feature engineering process and its impact on dataset robustness and predictive modeling capabilities.
Deliverables:
Jupyter Notebook or Python script detailing the feature engineering process and its implementation.
Visualizations (e.g., boxplots, distribution plots) illustrating temporal patterns, outlier detection, and impact assessment.
Feature-engineered dataset saved in Parquet format for further analysis and model development.
Conclusion section summarizing the feature engineering process and its implications for solar power generation forecasting.
Ticket Description:
Objective: Undertake feature engineering on the dataset following the Exploratory Data Analysis (EDA) of power generation and weather dynamics. The goal is to craft advanced features that reveal intricate relationships and dependencies, enhancing the predictive capacity of models. By refining the dataset with carefully curated variables, this approach aims to provide a nuanced understanding of the complex interplay between power output and environmental factors, setting the stage for highly accurate machine learning models tailored for solar power generation forecasting.
Tasks:
Data Loading:
Temporal Feature Engineering:
One-Hot Encoding:
Lag-Based Feature Generation:
Rolling Window Features for Temporal Trends:
Additional Contextual Features:
Identifying and Handling Outliers:
Save Feature-Engineered Dataset:
Conclusion:
Deliverables: