NINAnor / ecosystemCondition

This repository is for documenting the design and calculation of indicators for ecosystem condition in Norway
https://ninanor.github.io/ecosystemCondition/
Creative Commons Attribution 4.0 International
0 stars 4 forks source link

Issues around using broad ordinal scales #100

Closed anders-kolstad closed 1 year ago

anders-kolstad commented 1 year ago

Marianne Evju: Jeg syns det bør diskuteres det nesten meningsløse i så store trinn-ranges som vi har her. «The large range of the steps, particularly for the 7SE and 7TK variables, poses a problem for assessing ecological condition at the polygon scale. We tentatively transfer the ordinal values to a continuous scale based on the mean value of each step, but underline that this is ecologically problematic.”

See also #99 and #89

anders-kolstad commented 1 year ago

Added this: " I here transfer these ordinal variables over to a continuous scale based on the mean value of each step, but we acknowledge that there are important assumptions being made here, for example that the mean real value for 7TK or 7SE 3 is the mid-point in the range from 0.5 to 1 (i.e. 0.75). This is probably not correct. The large range of the steps, particularly for the 7SE and 7TK variables, poses a problem for assessing ecological condition at the polygon scale, but less so at aggregated scales. Hence, the indicator should not be presented and interpreted at the polygon scale. "

In other words, I don't think the coarseness of the scale affects the indiator that much, since the indicator values are area weighted means taken across a minimum of 150 polygons, and that much of the unprecission, and also much of the bias, is averaged out (law of large numbers).

marianneevju commented 1 year ago

Jeg er usikker på om jeg er enig her - jeg har vanskelig for å forstå at en indikator som gir lite mening på polygonnivå faktisk kan gi mening på stor skala. Antagelsen som du legger til grunn, er at den sanne fordelingen av slitasje i polygoner med slitasjegrad 2 representerer hele spennet fra 6 - 50 % slitasje, det vet vi faktisk ikke, men med den lave andelen polygoner som har slitasje > 50 %, er det grunn til å tro at det ikke er riktig? Uansett - vi må påpeke mangler i datagrunnlaget, ikke bare anta at de ikke er viktige på stor skala.

anders-kolstad commented 1 year ago

It would perhaps be possible to use a more infomed center value for these broad categories. As we have the shape of the distribution for the categorical variable, and we can clearly see that it's not gaussian, we can estimate or model a better fit, for example a gamma distribution. This would shift the center value towards the left, i.e. towards a smaller number.

In any case, I agree we need to communicate the weaknesses with the data and how the data can be improved for quite little effort, but I think this data still quite usable.

anders-kolstad commented 1 year ago

I did a little test, and since the data is so skewed, using the lower range limit is more accurate then using the center value, so I changed it. For example, the 1/2 - 1 range is converted to 50%. This is also not perfect, but better than it was.