ddalthorp / GenEst

R package development for a generalized mortality estimator
Other
6 stars 6 forks source link

Numerically named second split factors in summary #391

Closed mintzj closed 6 years ago

mintzj commented 6 years ago

Factor levels in a summary table for splits with two observation variables seem to be named in a strange way. Instead of seeing the names of the levels of the second variable in the table, I seem to get the mean of the cell, or something close to but not exactly the median. I encountered this so far running Solar Power Tower and Mock, with version 0.0.11.5. It almost looks like it is intentional, or maybe the numeric values in the column overwrote the ones which were supposed to be there?

  1. Data Input: (All Mock example files)

  2. Analyses > General Inputs: Number of Iterations = 1000, Confidence Level = 0.9, Size Class = Size

  3. Analyses > Searcher Efficiency: Observations = Search1, Search2, Search3, Search4 Predictor Variables = Season, Fix k = No

  4. Run Searcher Efficiency Model S: p ~ 1; k ~ Season L: p ~ Season; k ~ Season M: p ~ 1; k ~ 1 XL: p ~ 1; k ~ 1

  5. Analyses > Carcass Persistence: Last Time Present = LastPresentDecimalDays, First Time Absent = FirstAbsentDecimalDays, Predictor Variables = Season, Distributions = (All)

  6. Run Carcass Persistence Model S: exponential; l ~ Season L: exponential; l ~ 1 M: weibull; l ~ 1; s ~ Season XL: exponential; l ~ 1

  7. Analyses > Mortality Estimation: Fraction of Facility Surveyed = 1, Date Found = DateFound,

  8. Analyses > Mortality Estimation: Estimate

  9. Analyses > Mortality Estimation > Splitting Mortality: Schedule Variable = (none) Observation Variable = Ground Cover, Visibility

The names of the second column are always numeric when I choose two observation splits. I also can see this behavior in an image MH posted about a separate issue (#366) a while back.

image

ddalthorp commented 6 years ago

might be a "magic numbers" issue. I'll look into it.

ddalthorp commented 6 years ago

The table returned on the command line side appears to have the proper labels:

data(mock)
data_SE <- mock$SE
data_CP <- mock$CP
data_SS <- mock$SS
data_DWP <- mock$DWP
data_CO <- mock$CO

pkMods <- list(
  S = pkm(p ~ 1, k ~ Season, data = data_SE[data_SE$Size == "S", ]),
  L = pkm(p ~ Season, k ~ Season, data = data_SE[data_SE$Size == "L",]),
  M = pkm(p ~ 1, k ~ 1, data = data_SE[data_SE$Size == "M",]),
  XL = pkm(p ~ 1, k ~ 1, data = data_SE[data_SE$Size == "XL",])
)

right = "FirstAbsentDecimalDays";
left = "LastPresentDecimalDays";
cpMods <- list(
  S = cpm(l ~ Season, data = data_CP[data_CP$Size == "S", ], dist = "exponential",
    left = left, right = right),
  L = cpm(l ~ 1, data = data_CP[data_CP$Size == "L",], dist = "exponential",
    left = left, right = right),
  M = cpm(l ~ 1, s ~ Season, data = data_CP[data_CP$Size == "M",], dist="weibull",
      left = left, right = right),
  XL = cpm(l ~ 1, data = data_CP[data_CP$Size == "XL",], dist = "exponential",
      left = left, right = right)
)

eM <- estM(data_CO = data_CO, data_SS = data_SS, data_DWP = data_DWP, frac = 1,
  dateFoundCol = "DateFound", model_SE = pkMods, model_CP = cpMods, kFill = NULL,
  unitCol = NULL, datesSearchedCol = NULL,
    sizeclassCol = "Size", DWPCol = NULL, seed_SE = NULL, seed_CP = NULL,
    seed_g = NULL, seed_M = NULL, nsim = 1000, max_intervals = 8)

spl <- calcSplits(eM$M, eM$Aj, split_CO = c("GroundCover", "Visibility"), data_CO = data_CO)
summary(spl)

@DAPPERstats The GUI appears to be extracting the first column as names of second split rather than using row.names(...).

juniperlsimonis commented 6 years ago

ack, good catch. patched in v 0.0.11.8 #394