CIGLR-ai-lab / GreatLakes-TempSensors

Collaborative repository for optimizing the placement of temperature sensors in the Great Lakes using the DeepSensor machine learning framework. Aiming to enhance the quantitative understanding of surface temperature variability for better environmental monitoring and decision-making.
MIT License
0 stars 0 forks source link

Refine Training by Adding Additional Years to the Dataset #36

Closed DaniJonesOcean closed 1 month ago

DaniJonesOcean commented 2 months ago

Task Description:

Enhance the DeepSensor model's training process by incorporating additional years of data into the training dataset. This refinement is expected to improve the model's performance and its ability to generalize and capture temporal variability more effectively.

Checklist:

  1. Identify Additional Data:

    • [ ] Determine the additional years of observational data to be included in the training dataset (your call - maybe start small and go from there?)
  2. Data Preprocessing:

    • [ ] Preprocess the additional years of data to align with the existing training dataset format.
    • [ ] Verify the quality and consistency of the dataset, ensuring that it is compatible with the DeepSensor model's input requirements. (This is pretty much already done - it's just about adding more years)
  3. Update Training Dataset:

    • [ ] Integrate the additional years of data into the existing training dataset.
    • [ ] Ensure the dataset is correctly formatted and ready for use in model training.
  4. Retrain the DeepSensor Model:

    • [ ] Configure the DeepSensor environment to use the updated training dataset with the additional years of data.
    • [ ] Retrain the model and monitor the training process, documenting any improvements or challenges observed.
    • [ ] Evaluate the model's performance after incorporating the additional data and compare it to the baseline model to assess the impact.
  5. Documentation and Results:

    • [ ] Create a Jupyter notebook to document the process of adding additional years to the training dataset, including data preprocessing, integration, training, and results analysis.
    • [ ] Save the retrained model and relevant output files.
DaniJonesOcean commented 1 month ago

@eredding02 Likewise, feel free to add some info here about your extended training dataset and close the issue when you're happy with it : )

eredding02 commented 1 month ago

@DaniJonesOcean I extended the training set to use data from 2007-2019. I found that the 2006,2022, and 2023 GLSEA3 data all had at least one missing date, so they were excluded from training and validation. I used 200 randomly selected dates throughout 2020-2021 as validation and the training consisted of 40 epochs.

Image