SCBI-ForestGEO / 2023census

Repository for the 2023 recensus of the SCBI ForestGEO plot
Creative Commons Attribution 4.0 International
3 stars 0 forks source link

DBH checks #24

Open ValentineHerr opened 1 year ago

ValentineHerr commented 1 year ago

We need to align our thresholds between the app and the GitHub actions, so @jess-shue will try to add the following statements to the app's DBH checks (on top of the absolute checks of -0.5cm for negative and 4cm for positive).

dbh_previous*0.75 > dbh_current & !is.na(dbh_previous) & !is.na(dbh_current) # suspiciousNegativeGrowth
dbh_previous*1.92 < dbh_current & !is.na(dbh_previous) & !is.na(dbh_current) # suspiciousPositiveGrowth
teixeirak commented 1 year ago

@ValentineHerr and @jess-shue , I don't think there's any point in programming these into both the GIS app and GitHub actions. I think what makes most sense is:

1- GIS app flags absolute and possibly also relative anomalies in what has to be a relatively simple formulation that will not capture all anomalies 2- GitHub actions can be programmed to flag anomalies defined by size- and species- specific functions derived from past data. This would be ideal but is a bit complicated and not essential.

@jess-shue and @mitreds , how important do you think number 2 would be in catching errors?

ValentineHerr commented 1 year ago

I agree that there is no use in having the same check in both.

But I think it is good to add onto the app's check because already GitHub action is flagging suspicious dbh increases that were not flagged by the app's system. We don't want to have too many trees to go and check.

The flags are mostly for small trees that doubled in size, which, I admit, probably won't be flagged by a size- and species- specific function, but fine tuning that function will take time, and the more we have the field crew add the "I double checked in the field" code, the less we will need them to go back to the field later to re-find the tree and check the measurement.

Also, thinking out loud, I am wondering the implication of having such a fine mesh to detect dbh errors while previous census didn't have that. The variability of measurement error will be reduced. Would that matter for analysis in the future?

teixeirak commented 1 year ago

I completely agree that it's best to program checks into the app whenever possible. It's always easiest to catch the errors while they're at the tree.

I agree that reducing the measurement error will be a change from previous censuses, but I think it's purely a positive change. The question is whether it would become too burdensome for the crew to go back and check suspicious measurements. I think it will take a bit of trial to optimize the criteria for flagging potential errors, and there's definitely some philosophical calls there (what level of error is tolerable? what's the optimal tradeoff between error-free data and greater efficiency?).

teixeirak commented 12 months ago

I just spoke with Madeleine Udell (Stanford) about turning this issue into a project for data science MS students at Stanford (https://docs.google.com/document/d/18uXErzdAAf8DYM67JZVykz6wyXPhNS1ObTy4cLFTIqs/edit). The idea would be to use unsupervised machine learning to detect detect outliers.