PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

Support for NEOTOMA? #492

Open mdietze opened 9 years ago

mdietze commented 9 years ago

Should we try to cross-link to neotomadb.org to be able to pull in paleoecological data for cal/val of longer runs? Should we include NEOTOMA site names in the BETYdb site names, similar to the way we include FLUXNET names, to make it easy to search for sites with paleo data? Could we convince @SimonGoring to work on this?

SimonGoring commented 9 years ago

I'm happy to help you all work on it, or write the component that helps out with this. There is the neotoma package (v1.3.0 is on CRAN), that uses the Neotoma API to actually pull down data. If you're going to include Neotoma sites we need to recognize that the database is dynamic, so the site list shouldn't be static.

The neotoma package currently has a wrapper to look for all sites within a certain bounding box:

install.packages('neotoma')
library(neotoma)
run_sites <- get_dataset(loc=c(-130, 12, -110, 49), datasettype = 'pollen')
plot(get_site(run_sites)[,c('long', 'lat')])

this gives you all pollen sites in (approximately) the United States. We have some other geographic IDs you can use as well. The raw data is pulled down using a separate command, and there's some other considerations. Regardless, if you want to do this lets talk about what you need, how you want the input data to look, and what sort of tests we can write to make sure that future updates of neotoma don't break the pecan functionality.

bcow commented 9 years ago

In the mean time, I am adding a few sites to BETYdb - are there any specific tags I should add for NEOTOMA site names?

mdietze commented 9 years ago

@SimonGoring where we are right now is that, as Betsy notes, we were in the process of registering the PalEON MIP sites and met drivers within PEcAn [Billy's Lake, Demming Lake, Minden Bog, UNDERC] and I had the thought that it would be good to tag these sites as being PalEON related so we could search for that later. Then I thought it would be good to also check if there was a Neotoma standard name so that we could potentially cross reference. We haven't really thought this through in detail, but theres a couple different cases I could see happening. First is when there's a 1:1 match between a PEcAn site and a Neotoma site, we could grab Neotoma data for purposes of visualization/validation/calibration (assuming it was a long run, like the PalEON met). The second would be like the example you give where we might search within some radius of a site.

More generally, we're still in the early phases of trying to automate the general process of pulling in external data sets for purposes of comparing models to data (Betsy's been much more focused on automating the pulling in and processing of model inputs), but as we start to get these workflows sorted out it would be great if, given a general workflow in PEcAn, you'd be up for helping to write a specific instance for Neotoma. No rush, we're probably a few months away from even starting, but I just wanted to open the issue while we're thinking about it.

SimonGoring commented 9 years ago

@bcow where do I find the BETSYdb? It's probably best to use the collection.handle and dataset id for the sites. In Neotoma a single location can have multiple datasets (pollen, geochronologic, plant macrofossil, mammal fossils, &cetera), but each collection has a unique dataset id, and the collection name is generally associated with a single field campaign, so that if the site is revisited it would have a different collection handle.

@mdietze we are thinking about adding a circular (or polygonal) radius, we don't right now, but it's something that's relatively easy to implement.

We're talking about our next Geoinformatics proposal right now, so this might also be something for us to consider on the Neotoma side. If we've got a bit of time I think we could plan and build some nice cross-analysis tools. It would be nice to be able to pull longer-term pecan met results into neotoma searches as well. . .

SimonGoring commented 9 years ago

The other point is that if you let me know what you expect, in terms of data input to PEcAn, it's pretty easy to reformat the data.

mdietze commented 8 years ago

@SimonGoring I've been thinking about how to go about developing the Neotoma / PEcAn link in a way that gets us some basic functionality at the command line first, preferably in time for Camp 2016, and then incrementally works toward automation. Is the demo from PalEON Camp 2014 still working or does that code need updating? https://github.com/PalEON-Project/Camp2014/tree/master/Neotoma_Pollen

What I'm thinking is setting up some PEcAn code to ingest what's coming out of that example, either the data directly (e.g. for visualization) or run through a simple site-level statistical model (which might be used to inform site-level data assimilation -- need to look at that more and talk to @araiho about how we're coming on generalizing the SDA)

SimonGoring commented 8 years ago

Hi @mdietze, that's great. I just ran the demo and can confirm that the demo works (at least the non-JAGs portion), but I will edit it to update it. The big thing is that neotoma is on CRAN now, so that's a minor edit to the code. So, as far as functionality goes, if all you want is a matrix with site level info and pollen proportions, then you should be golden.

mdietze commented 8 years ago

Spent a little time playing with this today. I have a better feel for how we could do the download automatically at a grid-cell scale, as well as map taxa to PFTs automatically. I ran the JAGS demo and can see how to generalize that to a multivariate model so we can get pollen % estimates with a mean vector and error covariance on a PFT-level. What I'm not sure about is how we make the essential STEPPS leap from pollen % to fractional area on the landscape without calibrating a site-scale STEPPS-like model. Something to talk to PalEON folks about at Camp.

SimonGoring commented 8 years ago

I think you've hit the nail on the head here @mdietze.

We can do the mapping to PFTs programmatically in some senses. There is an existing function in the neotoma package called compile_taxa that compiles taxa to various higher taxonomic levels, generally morphologically. We can (somewhat) easily map taxa onto PFTs if that's helpful.

Having STEPPS operate as an R package is complicated (I think, @andydawson can chime in here), and probably requires a bit more infrastructure, but it would also be enormously rewarding, if we could do it.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.