SverreNystad / power-predictor

Using Machine Learning for time series forecasting of photovoltaic measurement for solar systems based on weather features
Other
6 stars 2 forks source link

Handle missing data. #18

Closed SverreNystad closed 8 months ago

SverreNystad commented 9 months ago

Strategies to Handle Missing Values:

  1. Removal:
    • Remove rows with missing values, especially when the number of such rows is minimal.
  2. Imputation:
    • Replace missing values with statistical measures like mean, median, or mode.
  3. Use more advanced techniques like model-based imputation or k-NN imputation. Interpolation:
    • For time-series data, interpolate missing values based on neighboring non-missing values.

Proposed Plan:

  1. Quantify Missing Data:

    • Calculate the proportion of missing values in each column to decide whether to remove or impute them.
    1. Impute or Remove:
      • For columns with a minimal amount of missing values, we can consider removing the rows with missing values or imputing them with statistical measures.
      • For columns with a substantial amount of missing values, we can consider advanced imputation methods or, if necessary, removal of the feature.
SverreNystad commented 8 months ago

We removed bad values, and dropped columns that had high percentage of missing data