CWWhitney / ethnobotanyR

R package for quantitative ethnobotany
http://htmlpreview.github.io/?https://github.com/CWWhitney/ethnobotanyR/blob/master/vignettes/ethnobotanyr_vignette.html
10 stars 20 forks source link

RI index is calculated wrongly I think #19

Closed rasmus87 closed 6 months ago

rasmus87 commented 3 years ago

The code doesn't seem to calculate the Relative Importance Index as in the paper you cite.

I think the RFCs(max) part of the code is wrong.

Your code says: RFCs <- RFCdata %>% dplyr::group_by(sp_name) %>% dplyr::summarize(RFCs = sum(FCps/(length(unique(informant))))) %>% dplyr::arrange(-RFCs)

Which gives FCs over total number of informants not over the max(FCs). There are 20 unique informants for all species in the test dataset, even though not all of them mention all species.

For the test dataset your code gives this:

> ethnobotanyR::RIs(ethnobotanydata)
  sp_name   RIs
1    sp_c 0.925
2    sp_a 0.812
3    sp_d 0.800
4    sp_b 0.738

while the RI should be:

> RFCsmax <- FCs(ethnobotanydata) %>% mutate(RFCsmax = FCs/max(FCs))
> RNUsmax <- NUs(ethnobotanydata) %>% mutate(RNUsmax = NUs/max(NUs))
> 
> RI <- left_join(RFCsmax, RNUsmax, by = "sp_name")
> RI %>% mutate(RIs = (RFCsmax + RNUsmax) / 2) %>% arrange(desc(RIs))
  sp_name FCs   RFCsmax NUs RNUsmax       RIs
1    sp_c  17 1.0000000   8   1.000 1.0000000
2    sp_a  15 0.8823529   7   0.875 0.8786765
3    sp_d  12 0.7058824   8   1.000 0.8529412
4    sp_b  12 0.7058824   7   0.875 0.7904412

I have spent some time trying to understand this discrepancy. If I have misunderstood the paper or your function please let me know. I am teaching a course next month using your package and before i confuse my students too much, I would like to know if I'm mistaken or the package has a bug.

If I am right your code could read:

RFCs <- RFCdata %>% dplyr::group_by(sp_name) %>% dplyr::summarize(FCs = sum(FCps)) %>%
  mutate(RFCs = FCs/max(FCs), FCs = NULL) %>% 
  dplyr::arrange(-RFCs)
CWWhitney commented 3 years ago

Thank you for the useful comment. Nice that you made the test and the suggestion.

This was a bug in both the RIs and the RFCs code.

rasmus87 commented 3 years ago

Thank you for a very quick fix. I think I was a little quick: The mutateshould of course be dplyr::mutate, to avoid mistakes.

Also, RFCs worked as it should already. the "fix" introduced an error. It should be divided by "N" (all interviewees), not by max(FC). RFCs(max) and RFCs are not that related, which is a bit confusing from the paper.

CWWhitney commented 3 years ago

I think I managed to catch the bugs now. Please let me know if you find anything else. CRAN is on vacation now but I will try to get the latest working code on the CRAN after January 4th (when they are back). Hope this works in time for your course. Please share any course materials etc. Would be nice to see how this is being taught.

rasmus87 commented 3 years ago

Cool, thanks. I will let you know if I catch anything else. I am not going though everything, just calculating some of the indices "by hand" for teaching purposes - and if they don't produce the same results I'll notice. I will send you the materials when they are done.