NeotomaDB / Neotoma-API

A Placeholder for the Neotoma API
3 stars 1 forks source link

Summary statistics for taxa and sites #2

Open scottsfarley93 opened 8 years ago

scottsfarley93 commented 8 years ago

It would be helpful to have a summary statistics endpoint for taxa (e.g., how many sites have this taxon?, how many datasets?, etc) and for sites (how many taxa are there at this site?, how many levels?, etc), without having to download the whole response. It would be easier to manage really large responses if we know how many to expect, and it would be faster to report the summaries if we didn't have to just that the .length method once we downloaded the whole response.

What do you think?

scottsfarley93 commented 8 years ago

Also, because not all taxa have the same measures of abundance (pollen is percent, mammals are counts, etc), it would be ideal to know which endpoint to call. That way I can call SampleData for mammal data and apidev.pollen for pollen taxa.

SimonGoring commented 8 years ago

So, maybe you could have two different responses:

SampleData?...&full=FALSE&threshold=1000 SampleData?...&full=TRUE

And by default you assume full is FALSE, so that you get the full response unless you hit a certain number of responses (defined by threshold), in which case you get the summary response.

@spatialit doesn't like this :)

He'd rather have either a summary or the full response.

Thoughts?

scottsfarley93 commented 8 years ago

I think the summary would be helpful as a full separate endpoint, so the aggregation and summarization is explicit, rather than tacking a new method/parameter onto each existing endpoint.

i.e., Summary?...siteID=19 or Summary?...taxonname=Quercus.

SimonGoring commented 8 years ago

<sixofone>So instead of a new method for each endpoint we write a new endpoint that includes each method?</halfdozenoftheother> :)

scottsfarley93 commented 8 years ago

But it's not exactly, because if it's one endpoint, I can call it once and get everything I need. If it's two endpoints, I need to call two separate endpoints, resulting in two async calls in my javascript.

The information in the two is the same if they are two endpoints or one, but things are easier from a consumption standpoint if it's one.