Open CeresBarros opened 4 years ago
in addition the metadata for P(sim)$imputeBadAgeModel
states:
"Model and formula used for imputing ages that are either missing or do not match well with Biomass or Cover. Specifically, if Biomass or Cover is 0, but age is not, then age will be imputed. Similarly, if Age is 0 and either Biomass or Cover is not, then age will be imputed."
However, the subsetting of "bad age" data in LandR::makeAndCleanInitialCohortData
uses:
cohortDataMissingAge <- cohortData[, hasBadAge :=
#(age == 0 & cover > 0)#| # ok because cover can be >0 with biomass = 0
(age > 0 & cover == 0) |
is.na(age) #|
#(B > 0 & age == 0) |
#(B == 0 & age > 0)
][hasBadAge == TRUE]#, by = "pixelIndex"]
So it seems to me that "Similarly, if Age is 0 and either Biomass or Cover is not, then age will be imputed"is not accurate.
@CeresBarros has this been resolved with all the various changes over the last few months?
No :/
P(sim)$imputeBadAgeModel
now agrees with the code in LandR::makeAndCleanInitialCohortData
:
Model and formula used for imputing ages that are either missing or do not match well with biomass or cover. Specifically, if biomass or cover is 0, but age is not, or if age is missing (NA
), then age will be imputed.
Note that age is zeroed where total biomass is 0 in LandR:::.createCohortData
, which is run before makeAndCleanInitialCohortData
However, I'm still puzzled with the age data that is used to fit the model.
Digging deeper:
At some point before fitting the model the cohortDataMissingAgeUnique
object is stripped of all data, except unique combos of "initialEcoregionCode"
and "speciesCode"
:
cohortDataMissingAgeUnique <- unique(cohortDataMissingAge,
by = c("initialEcoregionCode", "speciesCode")
)[
, .(initialEcoregionCode, speciesCode)
]
After this, the data is added back to these combos, from the original cohortData
:
cohortDataMissingAgeUnique <- cohortDataMissingAgeUnique[
cohortData,
on = c("initialEcoregionCode", "speciesCode"), nomatch = 0
]
cohortDataMissingAgeUnique <- cohortDataMissingAgeUnique[!is.na(cohortDataMissingAgeUnique$age)]
However, since "bad" age lines were not removed from cohortData
they're being added back (with the exception of NA ages which are excluded, see above). So it seems to me that bad ages of (age > 0 & cover == 0)
are being used to fit the model that will later impute/overwrite these same ages.
@eliotmcintire since you wrote this I guess you're the best person to ask "is there a reason why this is being done like this"? Were there maybe not enough data points per "initialEcoregionCode", "speciesCode"
combo if the bad ages were excluded for fitting?
I don't recall. I am sorry. Need to have written more comments. I am better now...
On Wed., Oct. 19, 2022, 10:24 p.m. Ceres Barros, @.***> wrote:
Digging deeper: At some point before fitting the model the cohortDataMissingAgeUnique object is stripped of all data, except unique combos of "initialEcoregionCode" and "speciesCode":
cohortDataMissingAgeUnique <- unique(cohortDataMissingAge, by = c("initialEcoregionCode", "speciesCode") )[ , .(initialEcoregionCode, speciesCode) ]
After this, the data is added back to these combos, from the original cohortData:
cohortDataMissingAgeUnique <- cohortDataMissingAgeUnique[ cohortData, on = c("initialEcoregionCode", "speciesCode"), nomatch = 0 ] cohortDataMissingAgeUnique <- cohortDataMissingAgeUnique[!is.na(cohortDataMissingAgeUnique$age)]
However, since "bad" age lines were not removed from cohortData they're being added back, which the exception of NA ages that are excluded (see above). So it seems to be that bad ages of (age > 0 & cover == 0) are being used to fit the model that will later impute ages on these pixels. @eliotmcintire https://github.com/eliotmcintire since you wrote this I guess you're the best person to ask "is there a reason why this is being done like this"? Were there maybe not enough data points per "initialEcoregionCode", "speciesCode" combo if the bad ages were excluded for fitting?
— Reply to this email directly, view it on GitHub https://github.com/PredictiveEcology/Biomass_borealDataPrep/issues/48#issuecomment-1284952965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIMVWYNZWDYFFUBNACJXCLWEDJR3ANCNFSM4M6NU5LA . You are receiving this because you were mentioned.Message ID: @.***>
No worries. We'll have to revisit it soon then and make a decision (with comments ;) ).
In
LandR::makeAndCleanInitialCohortData
, used inBiomass_borealDataPrep
why is the model to input bad ages being fit with the data subset that has the bad ages, instead of the data subset that has good ages?