Omdena-NIC-Nepal / machine-learning-linear-regression-ai-dreamers

omdena-nic-nepal-classroom-1f2b87-machine_learning_linear_regression-Machine_Learning_Linear_Regress created by GitHub Classroom
1 stars 3 forks source link

Data Preprocessing for Data_Preprocessing.ipynb #7

Closed urs-santoshh closed 1 month ago

urs-santoshh commented 1 month ago

Description:

We need to perform data preprocessing for the dataset in the notebooks/Data_Preprocessing.ipynb notebook. This includes handling missing values and outliers, encoding categorical variables, normalizing or standardizing numerical features, and splitting the data into training and testing sets. The corresponding script should be updated in scripts/data_preprocessing.py.

Tasks:

  1. Handle missing values and outliers:

    • Implement strategies for handling missing values, such as imputation or removal.
    • Address outliers using appropriate methods, such as transformation or removal.
  2. Encode categorical variables:

    • Convert categorical features into numerical values using techniques such as one-hot encoding or label encoding.
  3. Normalize/standardize numerical features:

    • Apply normalization or standardization to numerical features to ensure they are on a similar scale.
  4. Split the data:

    • Divide the dataset into training and testing sets using an appropriate ratio (e.g., 80/20 or 70/30).
  5. Update preprocessing script:

    • Ensure that the steps above are reflected in the scripts/data_preprocessing.py script for reproducibility.
  6. Document process and findings:

    • Provide a summary of the preprocessing steps taken and any issues encountered.
    • Ensure that the notebook is well-documented with comments explaining the steps and rationale.

Assignment:

Assign this issue to @Ashish-pixel333 . The assignee is responsible for completing the tasks outlined above, updating the preprocessing script, and documenting their process in the notebooks/Data_Preprocessing.ipynb notebook.

Deadline:

[Specify Deadline]

urs-santoshh commented 1 month ago

lets keep the discussion related to data preprocessing in this thread

Ashish-pixel333 commented 1 month ago

i will be doing the data processing so any issues relating this part will be discussed here.