JohnsonHsieh / iNEXT

R package for interpolation and extrapolation
https://JohnsonHsieh.github.com/iNEXT
57 stars 26 forks source link

Using iNEXT with a list object as in ciliates dataset, but with abundance data #58

Open naurasd opened 3 years ago

naurasd commented 3 years ago

In the exemplary work flow of the iNEXT vignette, the ciliates dataset is analyzed to show the diversity estimation based on raw species incidence data. Ciliates is a List of 3 object, made up of 3 habitats with multiple samples each. The dataset I am planning to analyze with iNEXT consists of species abundance data of 4 habitats with a varying number of samples.

This is what the ciliates dataset structure looks like:

List of 3 $ EtoshaPan : int [1:365, 1:19] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:365] "Acaryophrya.collaris" ... .. ..$ : chr [1:19] "x53" "x54" "x55" "x56" ... $ CentralNamibDesert : int [1:365, 1:17] 0 0 0 0 0 1 0 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:365] "Acaryophrya.collaris" ... .. ..$ : chr [1:17] "x31" "x32" "x34" "x35" ... $ SouthernNamibDesert: int [1:365, 1:15] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:365] "Acaryophrya.collaris" ... .. ..$ : chr [1:15] "x9" "x17" "x19" "x20" ...

When converting my abundance data to raw incidence data and creating a List of 4 object, (4 habitats with several samples each), just like the exemplary ciliates dataset, the iNEXT algorithm and ggiNEXT visualization work smoothly. I am able to choose the number of samples of the most extensively sampled habitat as "endpoint" and the diversity curves of all habitats are extrapolated to this point. This is what our dataset structure looks like (after converting to raw incidence data):

List of 4 $ Leafs : num [1:314, 1:29] 0 0 0 0 0 0 1 1 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:314] "Foram1" "Foram2" "Foram3" "Foram4" ... .. ..$ : chr [1:29] "Blatt_3B_1" "Blatt_3B_2" "Blatt_3B_3" "Blatt_3B_4" ... $ Sprouts : num [1:314, 1:20] 0 0 1 1 1 0 1 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:314] "Foram1" "Foram2" "Foram3" "Foram4" ... .. ..$ : chr [1:20] "Wurz_3B_1" "Wurz_3B_2" "Wurz_3B_3" "Wurz_3B_4" ... $ Redalgae : num [1:314, 1:16] 0 0 1 0 1 0 1 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:314] "Foram1" "Foram2" "Foram3" "Foram4" ... .. ..$ : chr [1:16] "ReAl_Co_1" "ReAl_Co_2" "ReAl_Co_3" "ReAl_Co_4" ... $ Posidonia: num [1:314, 1:49] 0 0 0 0 0 0 1 1 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : chr [1:314] "Foram1" "Foram2" "Foram3" "Foram4" ... .. ..$ : chr [1:49] "Blatt_3B_1" "Blatt_3B_2" "Blatt_3B_3" "Blatt_3B_4" ...

However, we also would like to estimate diversity based on our original abundance data. When passing our List of 4 object with abundance data as input to the iNEXT function with datatype = "abundance" , the following error output appears:

Error in FUN(X[[i]], ...) : invalid data structure

So it seems as for abundance data, iNEXT does not work with complex list objects consisting of several habitats with multiple samples each, but only with list objects looking like the spider one:

List of 2 $ Girdled: num [1:26] 46 22 17 15 15 9 8 6 6 4 ... $ Logged : num [1:37] 88 22 16 15 13 10 8 8 7 7 ...

Why is this the case? Let's imagine for inter- and extrapolation, I am not interested in the diversity as a function of individuals (as shown in the vignette exemplary workflow for the spider dataset), but as a function of sample units. Without being able to pass a list object such as ciliates (with abundance data in this case, of course) to the iNEXT algorithm, I would have to pool all my samples within each habitat to one single list to make it work with iNEXT and to be able to compare the habitats. This way, however, we would lose the information stored in each single sample list.

Has this issue occurred before? Is there a simple solution for this problem I am currently not aware of? What is the reason behind not being able to analyze abundance data stored in a more complex list object such as ciliates with the iNEXT algorithm?

Thanks for the help.

03rcooke commented 1 year ago

Did you get anywhere with this? I've got similar hopes of calculating sample coverage for abundance data but with multiple samples per site. What did you decide to do in the end? Is this possible by digging into the functions inside iNEXT::iNEXT()?

naurasd commented 1 year ago

hey. as there was no feedback from the developers, and I didn't have the skills or capacity at the time to dive into the functions themselves, we just ended up never doing this and stuck to the calculations that were possible with iNEXT at the time.

Here is the paper we did the analysis for: https://www.nature.com/articles/s42003-022-03523-5

and here is the iNEXT anaylsis we published on protocols.io for the paper. It can be cited via its own DOI (some of the functions and script will probably need to be adjusted, as the package and the online tool got updated I suppose):

Daraghmeh, N. & El-Khaled, Y. C. iNEXT4steps workflow for biodiversity assessment and comparison. protocols.io 1–5 (2021) https://doi.org/10.17504/protocols.io.bu6fnzbn

03rcooke commented 1 year ago

Thanks @naurasd, that's really helpful!

naurasd commented 1 year ago

Welcome ;-)

eliphalethcarmona commented 3 months ago

I don't know if it is still relevant, but Chiu (2023) published a way to estimate sample coverage and rarefaction and extrapolation curves of species richness based on sample-based abundance data. I hope it is soon implemented to iNEXT and its sequels (iNEXT.3D & iNEXT. beta3D).

Chiu, C. H. (2023). Sample coverage estimation, rarefaction, and extrapolation based on sample‐based abundance data. Ecology, 104(8), e4099. https://doi.org/10.1002/ecy.4099.