cran-task-views / ctv

CRAN Task View Initiative
75 stars 13 forks source link

CRAN task view proposal: Paleontology #57

Open willgearty opened 9 months ago

willgearty commented 9 months ago

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together a) a collection of traditional packages that are often seen in use in standard computational paleontological workflows, b) more recent paleontological or paleo-adjacent packages that are commonly in use in paleontology, and c) cutting edge paleo-explicit packages that we believe should be adopted by the paleontological community. Therefore, the purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, we have excluded older packages that have been superseded by more robust and/or featureful newer packages (e.g., there are a ~million packages related to ENM, but we have only included a handful). We also recognize that there are many other packages out there that are relevant to or explicitly for paleontology (we originally built a list of ~140 packages that we whittled down to the list below). We excluded most of these packages because we, as a group, had little experience with them or because the packages seemed unfinished or too niche to be useful. However, we'd love to hear from anyone that might have suggestions about other packages to include/exclude. Finally, where applicable, we plan to direct users to other CTVs that overlap in scope (see below).

Packages

Data acquisition

mapast, neotuma2, paleobioDB, rgbif, rgplates, ridigbio, chronosphere

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, ggtern, ggtree, SDAR, StratigrapheR, tidypaleo, geoChronR, rphylopic

Paleoecology

ade4, dismo, ecospace, ENMeval, ENMTools, fossil, fundiversity, vegan

Paleobiogeography and biodiversity

BAT, Compadre, divDyn, divvy, iNext, sepkoski

Phylogenetics

caper, diversitree, fbdR, FossilSim, geiger, mvMORPH, paleobuddy, paleotree, phytools, strap

Morphology

geomorph, Claddis, dispRity, morphospace

Time series

paleoTS, evoTS, layeranalyzer

Overlap

There is considerable overlap of the scope of this proposed CTV with the scope of other CTVs, including Environmetrics, Phylogenetics, TimeSeries, and Spatial. This stems from the fact that this proposed CTV is subject-oriented, rather than methodology-oriented. This doesn't appear to be an exception, though, given there are already CTVs on other subjects (e.g., ChemPhys). Further, this CTV is focused on which packages in these other CTVs may be used specifically within computational paleontological workflows.

Maintainers

Principal maintainer: @willgearty (also the principal maintainer of the Phylogenetics CTV) Co-maintainers: @AlfioAlessandroChiarenza, @bethany-j-allen, @ChristopherDavidDean, @KEichenseer, @LewisAJones, and @pedrolgodoy (this is a @palaeoverse project)

zeileis commented 9 months ago

Thanks for the proposal, Will @willgearty, and apologies for the slow response! I've finally had a closer look.

I like the proposal but I'm not fully convinced, yet, that the task view will be sufficiently separated from the existing task views. Relatedly, your process of package selection appears to be somewhat subjective - which we try to avoid in task views by adopting clear inclusion/exclusion criteria. Especially, excluding packages that you feel are too old or that you have no experience with, is too subjective.

Hence, I would ask you to establish sufficiently clear rules for inclusion/exclusion of a package, e.g., that it must be explicitly geared towards paleontology or something like that. And rules that would necessitate some individual review process (e.g., to determine whether a package is "useful" or "finished") should be avoided.

Regarding the maintainers: It's great to see an active community proposing a task view. Seven maintainers might still be feasible but maybe a smaller team would be easier to coordinate? Others could still contribute through issues and PRs. Also, I'm not sure whether the palaeoverse community is already so diverse and heterogeneous so that different palaeological views are reflected in it. Or would it help to bring in maybe one person from the outside as well?

I'm also pinging the principal maintainers of the Spatial, SpatioTemporal, and Environmetrics task views here: @rsbivand, @edzer, @gavinsimpson. Maybe you have some thoughts/ideas as well?

willgearty commented 9 months ago

Thanks @zeileis for the helpful comments.

We are certainly open to defining clearer rules for package inclusion/exclusion. I think if we are as exclusive as "explicitly geared towards paleontology", we'll be leaving lots of commonly used packages out (but you are right in that it would then be a very clear rule). However, most, if not all of these excluded packages are already in other task views, so they would at least already be covered there.

We'll give a little time for other folks to provide their thoughts/ideas as well, then we'll look into revising accordingly.

tuxette commented 9 months ago

Hi all! I am also unsure but, as I see it, the overlap with Phylogenetics is also non negligeable (but you know the TV better than I do). In short, what is not clear for me is: "do you have in mind at least some core packages that are very specific to Paleontology and not just to other related topics but useful for Paleontology in you list?" My question is probably quite naive (maybe these are clearly listed in your proposal but I am not able to identify them). These are the packages that, somehow, should be put forward in your TV, mentioning packages that have a larger broad but can be useful for the field afterward. But again, my comment might be completely wrong.

willgearty commented 1 week ago

My deepest apologies (to my co-maintainers and the CTV editors) for the horrible delay in responding to the feedback here. Despite some reservations, we've decided to go for a more conservative approach, as suggested by @zeileis, that includes only packages that are either explicitly designed for paleontology or are explicitly advertised to paleontologists (it appears this is similar to the approach of the Agriculture CTV, for example).

There are many other packages that paleontologists use as part of their workflows, and so, as part of the development of this CTV, we plan to suggest many of these packages to other CTVs where we believe they will be appropriate. We then plan to link out to these CTVs to ensure that users of the Paleontology CTV can find all of the resources that they may need for their highly interdisciplinary work (see below).

@tuxette there isn't a lot of interpackage dependencies in paleontology, so I wouldn't say any packages really stand out as "core" packages. However, if I had to pick a handful of packages based solely on their breadth of use, I would probably say palaeoverse, paleotree, and paleobioDB, but I'm probably biased. I'd be happy to look into download numbers in the future to identify which packages are most widely used before finalizing the list of "core" packages.

Here is an updated proposal for the Paleontology CTV:

Scope

Computational paleontology (or paleobiology) is a thriving field. Gone are the days of just digging up fossils; paleontologists now have the luxury of being able to perform a wide array of complex computational analyses on local and global compendia of fossil occurrence, phylogenetic, and morphological data to study the functional and phylogenetic evolution of organisms, ecosystem function and ecological interactions, paleobiogeographic patterns, and more. Until recently, computational paleontologists have mostly relied on resources designed for evolutionary biologists, ecologists, GISers, and data scientists to accomplish such analyses. However, slowly but surely, resources (including explicit R packages) are being developed to cater to these paleontological tasks.

This CTV brings together the vast majority of paleontological or paleo-adjacent packages that are in use in paleontology. The purpose of this CTV is to provide young and old paleontologists something of a guide to developing a wide variety of computational paleontological workflows. We have included packages (~50 at the moment) that span both the data acquisition/cleaning and analytical components of such workflows, with analyses covering paleoecology, paleobiogeography, phylogenetics, and more (see sections below).

We have excluded many of the most common packages (e.g., tidyverse, sf) because they are often imported by packages in this CTV and they are often covered exhaustively in other CTVs and guides. Further, to keep the list manageable, we also do not include packages that are often used in paleontological workflows but are not explicitly designed for or advertised to paleontologists. Where applicable, we plan to direct users to other CTVs that include many of these packages (and also plan to submit recommendations to these CTVs as necessary).

Packages

Data acquisition

chronosphere, folio, neotoma2, paleobioDB, rgbif, rgplates, ridigbio, rmacrostrat, rpaleoclim

Data cleaning

CoordinateCleaner, fossilbrush, palaeoverse

Data visualization

deeptime, GEOmap, rphylopic, SDAR, StratigrapheR, tidypaleo

Paleoecology

analogue, ecospace, fossil, rioja (and Environmetrics CTV)

Paleobiogeography and biodiversity

Compadre, divDyn, divvy, hespdiv, ppgm, sepkoski (and Spatial CTV)

Phylogenetics

CladeDate, fbdR, FossilSim/FossilSimShiny, paleobuddy, paleotree, RRphylo, strap (and Phylogenetics CTV)

Morphology

morphospace (and Phylogenetics CTV)

Time series

adePEM, astrocron, evoTS, paleoTS, RRatepol (and TimeSeries CTV)

Paleoclimate and Earth System variables

Bchron, cRacle, DAIME, geoChronR, isogeochem, pastclim, sedproxy

Overlap

Only 10 of the proposed packages are included in other CTVs (rgbif, analogue, rioja, FossilSim, paleobuddy, paleotree, strap, paleoTS, deeptime, and GEOmap).