questions/notes for 23/03 meeting

pedrorohde commented 1 year ago

benchmark
- performance
- model and features
2
- patches: how to extract them
which period to take when splitting across years/seasons (may reduce total number of examples)
- currently december-november because of data
- ideally split at lowest level for deciduous (~winter)
features
- validate current ones
- more ideas
time series smoothing
- fit them to a function
- which function(s)?
- see scipy.optimize.curve_fit
- other methods
barycenters
- make sense?
- see tslearn.barycenters
extract features from the images (possibly too much work)

pedrorohde commented 1 year ago

our approach so far:

extract time series for each index for each parcel (averaging over parcel, but also std and median) ignoring clouds ndvi, gndvi, ndmi, evi, avi parcels have an irregular shape => average over something else? e.g. patches
split the time series by season => needed? does inter-season information help?
extract features from time series: winter/summer difference, average derivative, day of max, amplitude (max-min) => more features? e.g. papers simply use index values at specific dates

results: over hold out test set over 90% accuracy for deciduous x evergreen ~80-85% accuracy for species (but imbalance in classes) example confusion matrix:

array([[ 17,   0,   4,   0,   0,   3,   0,   0,   0,   4,   0,   0,   0],
   [  0,   2,   0,   1,   0,   2,   0,   0,   0,   7,   0,   0,   0],
   [  3,   0,   3,   0,   0,   5,   0,   0,   3,   1,   0,   0,   0],
   [  0,   0,   1,   4,   0,  18,   0,   0,   0,   1,   0,   0,   0],
   [  0,   0,   0,   0,   0,  13,   0,   0,   0,   0,   0,   0,   0],
   [  0,   0,   1,   1,   0, 505,   0,   0,   0,   0,   0,   0,   0],
   [  0,   0,   0,   0,   0,   1,   0,   0,   0,   0,   0,   0,   0],
   [  0,   0,   0,   0,   0,   0,   0,   1,   0,   0,   0,   0,   0],
   [  0,   0,   1,   0,   0,   2,   0,   0,   0,   1,   0,   0,   0],
   [  4,   2,   1,   3,   0,   3,   0,   0,   0,  65,   7,   0,   0],
   [  0,   0,   0,   0,   0,   1,   0,   0,   0,  14,   1,   0,   0],
   [  0,   0,   0,   0,   0,   2,   0,   0,   0,   5,   1,   0,   0],
   [  1,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])

=> what metrics to use => how to deal with imbalance

pedrorohde commented 1 year ago

split problem in two parts: deciduous vs evergreen then species just index values is generally not enough

GarnierAdrian / MVA-rs-project

questions/notes for 23/03 meeting #4

2