ahhurlbert / aviandietdb

Avian Diet Database Summary Functions
9 stars 1 forks source link

modify dietSummaryByPrey() to account for analyses without the prey observed #1

Open ahhurlbert opened 3 years ago

ahhurlbert commented 3 years ago

Currently, dietSummaryByPrey()

  1. filters the database to records based on a given prey item,
  2. then takes an average across those analyses (for each Diet_Type separately)

The problem is that this average does not take into account all of the analyses for the focal bird species where the specified prey item was not consumed at all.

Take the example of Cedar Waxwings consuming caterpillars:

> dietSummaryByPrey("Lepidoptera", preyLevel = "Order", preyStage = "larva", speciesMean = TRUE, dietType = "Wt_or_Vol")
                                      Common_Name        Family Diet_Type Fraction_Diet   Prey_Name Prey_Level Prey_Stage
1                                   Cedar Waxwing Bombycillidae Wt_or_Vol    0.86500000 Lepidoptera      Order      larva
2                                  Elegant Trogon    Trogonidae Wt_or_Vol    0.82500000 Lepidoptera      Order      larva
3                                Evening Grosbeak  Fringillidae Wt_or_Vol    0.80000000 Lepidoptera      Order      larva
...

They appear to be at the top of the list because there is one study with two analyses finding 84% and 89% caterpillars in the diet (a spruce budworm outbreak).

However, speciesSummary()

> speciesSummary("Cedar Waxwing", by = "Order")
$Studies
[1] "Parrish, J. D. 1997. Patterns of Frugivory and Energetic Condition in Nearctic Birds During Autumn Migration. Condor 99: 681-697."                                                                                                            
[2] "Witmer, M. C. 1996. Annual diet of Cedar Waxwing based on U.S. Biological Survey records (1885-1950) compared to diet of American Robin: contrasts in dietary patterns and natural history. Auk 113:414-430."                                 
[3] "Martin, A. C., Zim, H. S., and Nelson, A. L. 1961. American wildlife & plants : a guide to wildlife food habits : the use of trees, shrubs, weeds, and herbs by birds and mammals of the United States. Dover Publications, New York, 500 pp."
[4] "Mitchell, R. T. 1952. Consumption of Spruce Budworms by Birds in a Maine Spruce-Fir Forest. Journal of Forestry 50(5):387-389."                                                                                                               
[5] "Howell, A. 1928. Birds of Alabama. Department of Game and Fisheries of Alabama."                                                                                                                                                              
.
.
.
$analysesPerDietType
   Diet_Type  n
1 Occurrence  2
2  Wt_or_Vol 20

$preySummary
                 Taxon     Prey_Part Wt_or_Vol Occurrence
13             Rosales         fruit    0.2029         NA
17 Unid. Magnoliopsida flower; fruit    0.1990     0.5000
18  Unid. Tracheophyta            NA    0.1760         NA
12             Pinales         fruit    0.1593         NA
9    Lepidoptera larva            NA    0.0865     0.0655
...

Because there were 18 other diet analyses where caterpillars made up 0%, the species summary accurately suggests that caterpillars only make up 8.65%.

Dealing with this in dietSummaryByPrey() will be potentially be slow because we will have to almost run dietSummary() for every bird species that has ever eaten the specified prey...

ahhurlbert commented 3 years ago

Actually, it will not be appropriate to divide by the total number of other diet analyses in all cases, because some of those analyses may have been at a taxonomically coarse resolution (e.g. % animal vs % plant; or % insecta) where a given prey item could potentially be included but cannot be broken out.

Thus, it seems the options are:

Concrete example: 3 studies (with 4 analyses) have examined Red-eyed vireo diet by % Occurrence:

If asked for a summary regarding the importance of caterpillars in the diet, the mean % occurrence returned should simply be 56.8%, rather than (56.8 + 0 + 0 + 0)/4 = 14%.

ahhurlbert commented 3 years ago

Updated code now calculates a species mean using all studies that identified at least some prey down to the taxonomic level of the preyName. Thus, in the example above, Parrish and Blake & Loiselle would not be included in the calculation of mean.

One potential problem would be a case that only identified arthropods down to "Insecta", but identified fruits down to Family. The current code would include that study in the count of studies to divide by even though "Lepidoptera" should not be considered to be 0% in that study.

The most appropriate solution would be to identify the taxonomic entity immediately above preyName, and then include the study in the denominator only if there is at least one record of a prey item that belongs to a group at the level of preyLevel within that higher taxonomic entity. (E.g., if there was an entry for Coleoptera, but not Lepidoptera, then Lepidoptera could properly be interpreted as 0%.)