Closed ericmkeen closed 3 months ago
Fortunately, such sightings have not yet been at play for abundance estimation, but it would be good to work through the best approach. One point of clarification, does average group size assignment in step 2 override the average ESW assignment described in Issue #11?
@ericmkeen - one other clarifying question. For the sightings column ss_valid
, which indicates whether a valid best estimate is available, you previously mentioned: "The current system is: If the best is not available, the low estimate is used. If the low is not available either, best is coerced to 1. Based on our notes here it sounds like we want to keep this system and simply update the data (or use coded edits) if we have specific sightings we wish to correct." That sounds good, with the clarifying question that if the low estimate or coerced value of 1 is used, does ss_valid
remain FALSE? Is that how those sightings get triggered for step 1 above? Trying to evaluate how this has all come together.
@amandalbradford: to answer your question, "if the low estimate or coerced value of 1 is used, does ss_valid
remain FALSE
?": yes, ss_valid
is FALSE
in the case when the low estimate needs to be used as the best estimate AND in the case that the best estimate value is coerced to 1.
@amandalbradford: a proposed solution to this issue as well as issue #11 (a similar question about how to handle missing Bft values), issue #8 (similar question about mixed species sightings with no percentages), and issue #9 (similar question about missing Bft values and impacts on group size calibration):
(1) Within the lta()
function, let's not perform any interpolation of any missing values. If rows have missing data for columns that are being used as covariates in the detection function, then those rows are removed and the function presses on. So, if LnSsTot
is a candidate covariate, any sighting with ss_valid==FALSE
is removed from both detection function fitting and abundance estimation; if Bft
is a covariate, any sighting with Bft==NA
is removed from both df fitting and abundance estimation. This is even the case for sightings from the focal year of interest for the abundance estimate.
(2) We add a function (working title lta_checks()
) that lets you quickly check for missing data in sightings from your focal year. The function can use the same input lists df_settings
, fit_filters
, and estimates
that are provided to the lta()
function. It tells the analyst which sightings have missing data, which allows the analyst to prepare coded edits that fill in gaps within the cruz
object before they run lta()
. This gives the analyst full discretion for how to fill in missing values (e.g., interpolation or some other solution of their own choice); the vignette could then provide examples for how to do this for missing group size estimates and missing Bft values.
I think this solution will simplify the lta()
code (and code in other functions too) and help users feel more in control over how missing data are handled.
What do you think?
@ericmkeen - I agree that your proposed solution is the way to go. We are most worried about treatment of sightings in our focal survey/year, as opposed to "imperfect" sightings from previous surveys/years that could join the sightings pool. I like the check function and allowing the user to specify their own correction method. The user will have to be careful to track the potential for "missing" incomplete sightings, but we can make this clear in the vignette. Thank you!
I have implemented this change. In the process I have improved/streamlined the code for determining whether or not a sighting has ss_valid = TRUE
or ss_valid = FALSE
. Changes were made in process_sightings()
, group_size()
, and group_size_calibration()
,
To make sure the method for assigning the ss_valid
status is clear, here is an outline of the workflow for determining it:
For each sighting, loop through each's observer's estimate of group size:
ss_valid
is still TRUE
for the observer-sighting). NA
or less than 0, the low estimate is used as the best estimate and the observer's ss_valid
becomes FALSE
. NA
or less than 0, the best estimate is coerced to 1
and the observer's ss_valid
remains FALSE
. ss_valid
is TRUE
. If at least one observer estimate remains, the sighting's overall ss_valid
status is TRUE
; if no estimates remain, the overall ss_valid
status becomes FALSE
and best size estimate becomes NA
. ss_valid
is TRUE
after this filter, the remaining observer estimates are used to find the geometric mean (or geometric weighted mean) estimate of group size. If that mean estimate is less than 0 or NA
for any reason, overall ss_valid
becomes FALSE
and the geometric (weighted) mean of the raw low estimates is used. If the low estimate has to be used and it is less than 0 or NA
, the best estimate becomes 1.0 and overall ss_valid
remains FALSE
. NA
and their ss_valid
status becomes FALSE
. Working on lta_checks()
now; once all changes are made I will test the workflow with the WHICEAS analysis.
Hi @ericmkeen - this looks good, but I want to discuss the last bullet. I don't know if we want to totally discount observer estimates for mixed-species groups if they don't include a percentage. Sometimes observers have a total group size, but they don't have a good feel for percentages by species. Since the process has been to average the best estimates, then average the percentages, and then apply the proportions to the estimates, I still think their estimates could be used. The issue here comes when NO observers provide percentages. What do you think?
Thanks @amandalbradford, I think my language was confusing on that last bullet point, so here's an attempt at clarification:
ss_valid
will not be changed.)NA
and ss_valid
will become FALSE
if it is not already. Does this clarify? Very possible I am still confused!
Thanks @ericmkeen - that's great and reflects how we've handled percentages in the past, while offering a better way for flagging ones without (ABUND used to simply remove them).
Sounds good! I implemented these changes and re-ran the WHICEAS analysis to look for any bugs and discrepancies, and everything looks good. Closing this issue.
ABUND's approach: In the event that a sighting occurs during systematic effort but no valid school size estimate is given for that sighting, the
ABUND
default is to assign a school size of 1. This happens for a handful of sightings in CNP 1986-2020.LTabundR In
LTabunR
, we implemented a new approach that we may want to revise. Currently the approach is this: (1) sightings with missing school sizes are flagged and excluded from detection function fitting when LnTotSS is specified as a covariate; (2) during abundance estimation, those sightings are given the average school size for their respective survey.This is probably the wrong way to go. It is problematic to estimate abundance with a detection function that does not include the sightings used to estimate abundance. But it would also be problematic to exclude the sightings from abundance estimation simply because they were missing data needed to included in the detection function model.
A better option may be somehow interpolating/inferring missing school size before detection function fitting so it can be included.