Collaborative design of a DeepSensor task for SST analysis

DaniJonesOcean commented 3 months ago

Issue Description

We aim to design a DeepSensor Task that will be a building block for our SST analysis. Our Task needs to accurately encompass the context and target sets involving SST data. At this stage, we have not fully specified the auxiliary datasets that should be incorporated into the Task - we can develop this collaboratively.

We can use the DeepSensor documentation on Tasks as guidance:

Objectives

[ ] Define the context and target sets for the SST DeepSensor Task.
[ ] Identify and discuss which auxiliary datasets could enhance our Task and how they might be used.
[ ] Generate a Task object with the agreed-upon context, target, and auxiliary datasets.
[ ] Visualize the Task for preliminary assessments.

Steps

Task 1: Review and Discuss Auxiliary Dataset Options

[ ] Compile a short list of potential auxiliary datasets (e.g. lake depth).
[ ] Discuss the relevance and availability of each dataset and how it might integrate into the Task.

Task 2: Prepare SST Context and Target Data

[ ] Ensure that SST data is correctly processed, handling NaN values and normalizing as appropriate using DeepSensor's data loader.
[ ] Establish the primary SST variable as the target set for the Task.

Task 3: Setup TaskLoader with Unspecified Auxiliary Data

[ ] Initialize TaskLoader, using SST as the target

# Pseudo code for initializing `TaskLoader`
from deepsensor.data import TaskLoader

# Set placeholders for context and auxiliary data to be determined
context_data_placeholder = None  # To be replaced with actual context data
auxiliary_data_placeholder = None  # To be replaced with actual auxiliary data

task_loader = TaskLoader(
    context=context_data_placeholder,
    target=sst_data,
    aux=auxiliary_data_placeholder
)

Task 4: Delegate Sub-Tasks as Needed

[ ] Assign research tasks and acquire auxiliary datasets.
[ ] Create sub-tasks for preprocessing of auxiliary datasets to fit DeepSensor requirements.

Task 5: Visualize Designed Task

[ ] Create visualization scripts for the `Task'

# Visualization script; replace 'task' and 'task_loader' with actual objects
fig = deepsensor.plot.task(task, task_loader)
plt.show()

Deliverables

A short list of auxiliary datasets and justifications for their inclusion.
A collaborative decision on the final choice of auxiliary datasets for the Task.
A Task object that integrates our SST data with the chosen context and auxiliary datasets.
Visualizations demonstrating the integration and potential insights from the Task.

This issue will foster collaboration and strategic thinking as we refine our approach to SST sensor placement using DeepSensor.

eredding02 commented 2 months ago

@DaniJonesOcean In the process of designing Tasks, I have discovered that DeepSensor does not allow NaN values in the target set due to backpropagation. To solve this, I have masked NaN values as a small negative value. I have included masked GLSEA3 data and a land mask as context data sets. We may also include bathymetry as an auxiliary dataset, mirroring the DeepSensor paper’s use of elevation. The target data set is the masked GLSEA3 data—in the few trainings I have attempted I found much better results with using the same datasets for context and target, regardless of the fact that DeepSensor can handle NaN values in the context set. All data points in the data sets provided are used, rather than a uniform random sample taken. Using deepsensor.plot.task we can see that there are values for land, contrary to the original GLSEA3 data. Additionally, we can see that we are trying to predict the same sort of SST data provided in the context set. As a sidenote, the land mask looks like it has values greater than 1 but that is not the case pre-data-processing.

DaniJonesOcean commented 2 months ago

Apologies for the delay in commenting! This all looks good to me, and you've flagged up some things for us to be aware of going forward. I'm not very concerned at the moment; this seems mostly positive/neutral at present. Thanks again!

CIGLR-ai-lab / GreatLakes-TempSensors