CornellLabofOrnithology / ebird-best-practices

Best Practices for Using eBird Data
https://CornellLabOfOrnithology.github.io/ebird-best-practices/
Other
32 stars 12 forks source link

impact deciduous broadleaf on detection probability #13

Open camm1gh opened 3 years ago

camm1gh commented 3 years ago

Hello,

When running the example code in chapter 5.3.3. of the online tutorial, I've noticed that the impact of deciduous broadleaf forest on detectability is positive (I've used the ebd_woothr_june_bcr27_zf.csv as input). I follow the interpretation of a denser habitat making it more difficult to detect a species, yet this shows something different.

Could you help figure this out please?

My results:

coef(occ_avg) %>% 
+   enframe() 
# A tibble: 13 x 2
   name                                 value
   <chr>                                <dbl>
 1 psi(Int)                          -1.97   
 2 psi(pland_04_deciduous_broadleaf)  5.76   
 3 psi(pland_05_mixed_forest)         1.28   
 4 psi(pland_13_urban)               -1.97   
 5 p(Int)                            -1.07   
 6 p(duration_minutes)                0.00569
 7 p(pland_05_mixed_forest)           2.91   
 8 p(effort_distance_km)             -0.159  
 9 p(protocol_typeTraveling)          0.496  
10 psi(pland_12_cropland)            -1.21   
11 p(time_observations_started)       0.00639
12 p(number_observers)               -0.0406 
13 p(pland_04_deciduous_broadleaf)    0.0224 

I have the same number of data (48450), reduced data (3724) and sites (988) (chapter 5.2.1), however the summary of the unmarked object shows slightly different numbers (chapter 5.2.3). My results:

summary(occ_um)
unmarkedFrame Object

584 sites
Maximum number of observations per site: 10 
Mean number of observations per site: 3.84 
Sites with at least one detection: 65 

Tabulation of y observations:
   0    1 <NA> 
2115  129 3596 

Site-level covariates:
 n_observations      latitude       longitude      pland_04_deciduous_broadleaf pland_05_mixed_forest pland_12_cropland pland_13_urban  
 Min.   : 2.000   Min.   :29.68   Min.   :-91.40   Min.   :0.00000              Min.   :0.00000       Min.   :0.00000   Min.   :0.0000  
 1st Qu.: 2.000   1st Qu.:31.15   1st Qu.:-85.55   1st Qu.:0.00000              1st Qu.:0.00000       1st Qu.:0.00000   1st Qu.:0.0000  
 Median : 2.000   Median :33.08   Median :-81.35   Median :0.00000              Median :0.00000       Median :0.00000   Median :0.0000  
 Mean   : 3.842   Mean   :33.23   Mean   :-81.98   Mean   :0.05415              Mean   :0.04688       Mean   :0.02501   Mean   :0.1531  
 3rd Qu.: 4.000   3rd Qu.:35.05   3rd Qu.:-78.34   3rd Qu.:0.03226              3rd Qu.:0.00000       3rd Qu.:0.00000   3rd Qu.:0.1935  
 Max.   :10.000   Max.   :37.36   Max.   :-75.46   Max.   :1.00000              Max.   :1.00000       Max.   :1.00000   Max.   :1.0000  

Observation-level covariates:
 time_observations_started duration_minutes effort_distance_km number_observers    protocol_type  pland_04_deciduous_broadleaf
 Min.   : 0.417            Min.   :  1.00   Min.   :0.000      Min.   :1.000    Stationary:1272   Min.   :0.000               
 1st Qu.: 7.662            1st Qu.: 16.00   1st Qu.:0.000      1st Qu.:1.000    Traveling : 972   1st Qu.:0.000               
 Median :10.300            Median : 35.00   Median :0.000      Median :1.000    NA's      :3596   Median :0.000               
 Mean   :11.736            Mean   : 53.16   Mean   :0.729      Mean   :1.192                      Mean   :0.050               
 3rd Qu.:16.225            3rd Qu.: 69.25   3rd Qu.:1.020      3rd Qu.:1.000                      3rd Qu.:0.032               
 Max.   :23.550            Max.   :300.00   Max.   :4.989      Max.   :4.000                      Max.   :1.000               
 NA's   :3596              NA's   :3596     NA's   :3596       NA's   :3596                       NA's   :3596                
 pland_05_mixed_forest
 Min.   :0.000        
 1st Qu.:0.000        
 Median :0.000        
 Mean   :0.037        
 3rd Qu.:0.000        
 Max.   :1.000        
 NA's   :3596

In your online example, the summary for the deciduous broadleaf shows values of either 0 or 1 so first I thought this might be the cause. Your results:

#> Observation-level covariates:
#>  time_observations_started duration_minutes effort_distance_km number_observers protocol_type      pland_04_deciduous_broadleaf
#>  Min.   : 0                Min.   :  1      Min.   :0          Min.   :1        Length:5840        Min.   :0                   
#>  1st Qu.: 8                1st Qu.: 16      1st Qu.:0          1st Qu.:1        Class :character   1st Qu.:0                   
#>  Median :10                Median : 35      Median :0          Median :1        Mode  :character   Median :0                   
#>  Mean   :12                Mean   : 53      Mean   :1          Mean   :1                           Mean   :0                   
#>  3rd Qu.:16                3rd Qu.: 70      3rd Qu.:1          3rd Qu.:1                           3rd Qu.:0                   
#>  Max.   :24                Max.   :300      Max.   :5          Max.   :4                           Max.   :1                   
#>  NA's   :3587              NA's   :3587     NA's   :3587       NA's   :3587                        NA's   :3587 

But when I look at my occupancy model summary, I do see a negative impact. My results:

summary(occ_model)

Call:
occu(formula = ~time_observations_started + duration_minutes + 
    effort_distance_km + number_observers + protocol_type + pland_04_deciduous_broadleaf + 
    pland_05_mixed_forest ~ pland_04_deciduous_broadleaf + pland_05_mixed_forest + 
    pland_12_cropland + pland_13_urban, data = occ_um)

Occupancy (logit-scale):
                             Estimate    SE      z  P(>|z|)
(Intercept)                     -2.01 0.244 -8.229 1.89e-16
pland_04_deciduous_broadleaf     5.81 1.985  2.925 3.44e-03
pland_05_mixed_forest            1.21 0.834  1.452 1.47e-01
pland_12_cropland               -1.10 2.005 -0.551 5.82e-01
pland_13_urban                  -1.94 0.983 -1.972 4.86e-02

Detection (logit-scale):
                             Estimate      SE      z P(>|z|)
(Intercept)                  -1.15405 0.56085 -2.058 0.03962
time_observations_started    -0.00324 0.03000 -0.108 0.91399
duration_minutes              0.00648 0.00316  2.051 0.04027
effort_distance_km           -0.21249 0.13307 -1.597 0.11030
number_observers              0.02893 0.30450  0.095 0.92431
protocol_typeTraveling        0.66896 0.38196  1.751 0.07988
pland_04_deciduous_broadleaf -0.27782 0.63219 -0.439 0.66033
pland_05_mixed_forest         2.73768 1.01597  2.695 0.00705

AIC: 700.9852 
Number of sites: 584
optim convergence code: 0
optim iterations: 56 
Bootstrap iterations: 0 

My guess is that the final positive coefficient has to do something with the model averaging step? If this step changes the interpretation of the results, should it better be skipped in the case where model selection is not indicating that some models are clearly better?

Thanks in advance for your help, Camille

mstrimas commented 3 years ago

Hi Camille, there is a component of stochasticity involved here, so I suspect you're right that something is happening in the model averaging that's causing the difference. I'm the primary maintainer of this repository but not very knowledgable about the technical details of occupancy modeling. @vivirg wrote this section of the Best Practices book, so maybe she can help. Otherwise, the question may be best directed to the unmarked listserv or to someone else with expertise in occupancy modeling.

vivirg commented 3 years ago

Hi Camille, I can look into this tomorrow and see what’s going on. Could you send me your workspace and code to compare? Viviana

On Sep 16, 2021, at 1:42 PM, Matt Strimas-Mackey @.***> wrote:

 Hi Camille, there is a component of stochasticity involved here, so I suspect you're right that something is happening in the model averaging that's causing the difference. I'm the primary maintainer of this repository but not very knowledgable about the technical details of occupancy modeling. @vivirg wrote this section of the Best Practices book, so maybe she can help. Otherwise, the question may be best directed to the unmarked listserv or to someone else with expertise in occupancy modeling.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

camm1gh commented 3 years ago

Sure Viviana, here are the code (ran till line 186) and workspace. code.zip

Camille