These are just questions that occurred to me as I read over the full report. You don't need to address them now, but they might come up in the questions during the exam.
You observed that the ease of classifying water, and that it is represented by many pixels, might have given a false impression of how accurate the model is. How could you reduce the impact of the water pixels in fitting the model? Or, how could you force the model to put more weight on correctly classifying the non-water classes?
You mention repeating the random sampling of polygons into test and train set until the breakdown of pixels "looked" OK. Is this still a random sample?
There is no way to verify the final prediction with the data you have. How would you design additional data collection to verify the final predictions?
Is there a way to use set up the forest, so the OOB samples would give error measures closer to those on the test set? I'm wondering if there is a way to use the strata argument.
Do you have any ideas on how to use the importance measure results to build a model with better predictive power?
These are just questions that occurred to me as I read over the full report. You don't need to address them now, but they might come up in the questions during the exam.
You observed that the ease of classifying water, and that it is represented by many pixels, might have given a false impression of how accurate the model is. How could you reduce the impact of the water pixels in fitting the model? Or, how could you force the model to put more weight on correctly classifying the non-water classes?
You mention repeating the random sampling of polygons into test and train set until the breakdown of pixels "looked" OK. Is this still a random sample?
There is no way to verify the final prediction with the data you have. How would you design additional data collection to verify the final predictions?
Is there a way to use set up the forest, so the OOB samples would give error measures closer to those on the test set? I'm wondering if there is a way to use the
strata
argument.Do you have any ideas on how to use the importance measure results to build a model with better predictive power?