CIGLR-ai-lab / GreatLakes-TempSensors

Collaborative repository for optimizing the placement of temperature sensors in the Great Lakes using the DeepSensor machine learning framework. Aiming to enhance the quantitative understanding of surface temperature variability for better environmental monitoring and decision-making.
MIT License
0 stars 0 forks source link

Collaborative design of a DeepSensor task for SST analysis #19

Closed DaniJonesOcean closed 2 months ago

DaniJonesOcean commented 3 months ago

Issue Description

We aim to design a DeepSensor Task that will be a building block for our SST analysis. Our Task needs to accurately encompass the context and target sets involving SST data. At this stage, we have not fully specified the auxiliary datasets that should be incorporated into the Task - we can develop this collaboratively.

We can use the DeepSensor documentation on Tasks as guidance:

Objectives

Steps

Task 1: Review and Discuss Auxiliary Dataset Options

Task 2: Prepare SST Context and Target Data

Task 3: Setup TaskLoader with Unspecified Auxiliary Data

# Pseudo code for initializing `TaskLoader`
from deepsensor.data import TaskLoader

# Set placeholders for context and auxiliary data to be determined
context_data_placeholder = None  # To be replaced with actual context data
auxiliary_data_placeholder = None  # To be replaced with actual auxiliary data

task_loader = TaskLoader(
    context=context_data_placeholder,
    target=sst_data,
    aux=auxiliary_data_placeholder
)

Task 4: Delegate Sub-Tasks as Needed

Task 5: Visualize Designed Task

# Visualization script; replace 'task' and 'task_loader' with actual objects
fig = deepsensor.plot.task(task, task_loader)
plt.show()

Deliverables

This issue will foster collaboration and strategic thinking as we refine our approach to SST sensor placement using DeepSensor.

eredding02 commented 2 months ago

@DaniJonesOcean In the process of designing Tasks, I have discovered that DeepSensor does not allow NaN values in the target set due to backpropagation. To solve this, I have masked NaN values as a small negative value. I have included masked GLSEA3 data and a land mask as context data sets. We may also include bathymetry as an auxiliary dataset, mirroring the DeepSensor paper’s use of elevation. The target data set is the masked GLSEA3 data—in the few trainings I have attempted I found much better results with using the same datasets for context and target, regardless of the fact that DeepSensor can handle NaN values in the context set. All data points in the data sets provided are used, rather than a uniform random sample taken. Using deepsensor.plot.task we can see that there are values for land, contrary to the original GLSEA3 data. Additionally, we can see that we are trying to predict the same sort of SST data provided in the context set. As a sidenote, the land mask looks like it has values greater than 1 but that is not the case pre-data-processing.

Screenshot 2024-06-24 at 10 26 54 AM
DaniJonesOcean commented 2 months ago

Apologies for the delay in commenting! This all looks good to me, and you've flagged up some things for us to be aware of going forward. I'm not very concerned at the moment; this seems mostly positive/neutral at present. Thanks again!