Renovus-Tech / solarec-python

GNU Affero General Public License v3.0
0 stars 0 forks source link

M2 - A3 - Extract relevant features from the collected data that can impact solar panel performance, such as solar irradiance, temperature, humidity, panel orientation, and tilt angle #3

Closed renovus closed 5 months ago

renovus commented 10 months ago

Ticket Description:

Objective: Undertake feature engineering on the dataset following the Exploratory Data Analysis (EDA) of power generation and weather dynamics. The goal is to craft advanced features that reveal intricate relationships and dependencies, enhancing the predictive capacity of models. By refining the dataset with carefully curated variables, this approach aims to provide a nuanced understanding of the complex interplay between power output and environmental factors, setting the stage for highly accurate machine learning models tailored for solar power generation forecasting.

Tasks:

  1. Data Loading:

    • Load the historical power generation and weather data preprocessed in the previous notebook.
  2. Temporal Feature Engineering:

    • Introduce additional features derived from the timestamp to capture temporal patterns:
      • Hour, Month, Day.
      • Seasonal categorization (Winter, Spring, Summer, Autumn).
    • Visualize temporal patterns using boxplots to understand hourly and monthly distribution of power generation.
  3. One-Hot Encoding:

    • Use one-hot encoding to represent categorical variables (season, month, hour) for better compatibility with machine learning algorithms.
  4. Lag-Based Feature Generation:

    • Create lag-based features to incorporate historical information at varying intervals (1 hour, 2 hours, 4 hours, 24 hours, 30 days).
  5. Rolling Window Features for Temporal Trends:

    • Generate rolling window-based features (mean, standard deviation, maximum, exponentially weighted moving average) for target and environmental variables over different time windows (24 hours, 48 hours, 30 days).
  6. Additional Contextual Features:

    • Introduce contextual features such as hours since last rain, days since installation, wind chill index, and solar zenith angle to provide further insights into environmental conditions.
  7. Identifying and Handling Outliers:

    • Apply heuristic-based methods and statistical approaches to identify and remove outliers:
      • No power with radiation heuristic.
      • Low voltage with non-zero device temperature heuristic.
      • Statistical outlier detection using Interquartile Range (IQR) method.
    • Visualize distribution of power generation before and after outlier removal for assessment.
  8. Save Feature-Engineered Dataset:

    • Save the feature-engineered dataset for use in subsequent stages of the pipeline.
  9. Conclusion:

    • Summarize the feature engineering process and its impact on dataset robustness and predictive modeling capabilities.

Deliverables:

fcggamou commented 5 months ago

Added notebook: https://github.com/Renovus-Tech/solarec-python/blob/main/app/ml/notebooks/anomaly_detection/2%20-%20Feature%20Engineering.ipynb