ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Link to Geology API #1642

Closed Jegelewicz closed 3 years ago

Jegelewicz commented 6 years ago

Is your feature request related to a problem? Please describe. Geology!

Describe the solution you'd like Geologic taxonomy that we don't have to manage.

Describe alternatives you've considered We have been working on this for a while, but it would be nice to have someone else manage the details.

Additional context @dperriguey suggests that we link up with Macrostrat.

dperriguey commented 6 years ago

Thank you @Jegelewicz I've been confused on how to use this new function in GitHub.

dustymc commented 6 years ago

Issue docs are http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html - let us know if something doesn't make sense.

Looks like a usable API, although it has some obvious quirks (mostly in cryptic abbreviations).

So what exactly are ya'll asking to DO? Link to them? Pull their data into our current structure? Consider their data in search? Ditch our "higher" data altogether, pull when we need it? ????

@KatherineLAnderson

dperriguey commented 6 years ago

I messaged with Shanan E Peters (peters@geology.wisc.edu) at GeoDeepDive, which manages Macrostrat, and he said that they are constantly updating Macrostrat and using the API should be clear enough for us to use. If it is not (cryptic abbreviations), then we should contact them and maybe they can help us make sense of the data. I think we should keep our current structure and link to them if possible. Macrostrat does not have everything in it. It only covers North America, and even then it can fall short, though vary rarely.

dustymc commented 6 years ago

If the intent is to link, then I need a data access URL instead of the API (although perhaps I can use the API to build the URL - I'd prefer to avoid that if possible, but we have the tools).

The API looks like I need to know some details, and our data is unpredictable.

http://bla bla/whatever/Weno Formation (a formation that includes "Formation") http://bla bla/whatever/Caloosahatchee (a formation that doesn't include "Formation") http://bla bla/whatever/Middle+Jurassic (different class of data, I think) http://bla bla/whatever/Something Not In Their DB

should all do something useful.

Or maybe we just have to be more consistent in our data. (FYI the ability to reliably integrate other stuff on the Internet is a big part of the reason I'm always begging for consistency.)

Real example:

https://macrostrat.org/api/units?strat_name=Weno%20Formation (what we have in our data, but aimed as "strat_name" which isn't obvious from our data) fails

https://macrostrat.org/api/units?strat_name=Weno does some stuff, but I have no idea how to interpret the results - there's a "Wenonah" (probably not what we want, although it's what the UI returns in a search for "weno"), Wenonah Fm (same as above??), Pawpaw Fm and Weno Fm" (a name or combination or??), Weno Limestone, Weno Limestone and Pawpaw Limestone, ....

The lack of a clear path may well be more my geological ignorance than any problems with data or APIs. Can someone work out the path to what we'd like users to see from a couple examples of data in Arctos?

dperriguey commented 6 years ago

I'm asking about your questions now.

I think it would be useful to just do a search in say a CSV to look for duplicates between what we have and Macrostrat. If everything in our hierarchy is duplicated in Macrostrat then we can scrap ours all together and just link. But again, I'm asking about your questions to see what answers I can get.

dustymc commented 6 years ago

scrap ours all together and just link

I doubt it's that simple - a user searches for "bla," we need to query a hierarchy to find that "subbla" is wholly included in "bla" and return records with that determination etc., and that's not usually practical via API.

That said, "someone's already doing a good job of this, it ain't our problem" is pretty much always my preferred solution. We could just cache the absolute minimum necessary to respond to queries, not allow entering data which aren't in the API (and/or preemptively pull all of their data as our authority), pull from the API for display, etc., etc., etc. I don't think we can scrap our stuff entirely, but we might be able to make users believe that's what we've done.

I'm also happy to reciprocate in whatever way we can. (Something like we do for GenBank, perhaps.) A user searching macrostrat might be interested in seeing what's in Cape Deceit (http://arctos.database.museum/SpecimenResults.cfm?geology_attribute_value=Cape%20Deceit%20Formation) or etc. (And that's not wholly altruistic either - those same users might tell us that our Jurassic Mus are unlikely. More ways of seeing data and more users looking at data sorta always leads to better data.)

dperriguey commented 6 years ago

Here is Shanon's response from GeoDeepDive

A data access URL is exactly what we mean by an API. In our approach, the URL=API call is a parameterized URL string, not a “hard coded” URL string. It’s a philosophy difference, but there’s no other difference when it comes to accessing the information.

Example… https://macrostrat.org/api/v2/defs/strat_names?strat_name=Weno https://macrostrat.org/api/v2/defs/strat_names?strat_name_like=Weno Gives all variants of the strat_name “Weno” that Macrostrat knows about, plus some information about it, using both an exact and fuzzy match. Names are hard because they have multiple different expressions.

We also have a representation of “concept” for many stratigraphic names. This is basically a way to recognize name variants as belonging to the same entity by lumping them within the same concept, which also allows us to summarize name metadata and external links: https://macrostrat.org/api/v2/defs/strat_name_concepts?concept_name=Weno

An alternative worldview for how to build a URL/API might be: https://macrostrat.org/api/v2/defs/strat_names/Weno We don’t favor this approach. It leaves ambiguity (do we need a separate “like” route?) and results in the same data being accessed.

The units route is designed to access spatial, temporal, and rock property values of individuals rock units intersected by columns and various properties, like nomenclature. The example you gave, parameterized as “strat_name=Weno” (https://macrostrat.org/api/units?strat_name=Weno) actually doesn’t behave properly right now, in my opinion. It is currently behaving as “strat_name_like” which is not ideal. We will change this by adding the parameter “strat_name_like” so that is consistent with what you get in the /defs/strat_names route.

If you are trying to match to lithostratigraphic names, then I’d recommend getting the specific name first: https://macrostrat.org/api/v2/defs/strat_names?strat_name=Weno

Then get the spatial-rock property aspects that are captured by units: https://macrostrat.org/api/v2/units?strat_name_id=3216

In this specific case, you seem to have found one of the very few unit(s) that have multiple formal names assigned to them. This is not supposed to happen, but it does sometimes in a few of our older sources. The “unit name” field is informal, but useful in this case. The same units are returned for both names: https://macrostrat.org/api/v2/units?strat_name_id=3215

Sorry for the several bugs that your poking around identified. We’ll keep improving it so keep them coming. Hope this helps at least a little.

Shanan

If this is helpful or not, let me know. -D

dustymc commented 6 years ago

Two major things, I think.

1) I'm still not sure what WE are doing here. If we're just linking, I need something that looks like eg https://macrostrat.org/sift/#/interval/94. My preferred method of getting there would be to just link https://macrostrat.org/magic/Devonian.

2) I have "thing someone typed." I have absolutely no idea which pigeonhole it came out of.

screen shot 2018-08-13 at 6 52 42 am

The UI search seems to figure it out. If that just a bunch of API calls, or if it accessing something like https://macrostrat.org/magic/Devonian? If the latter, if that available in the API?

And a suggestion: These data are hierarchical. Our data look like http://arctos.database.museum/info/ctDocumentation.cfm?table=CTGEOLOGY_ATTRIBUTE. A user who searches "Calabrian" will get specimens with a determination of "Phanerozoic" because the Calabrian IS Phanerozoic, just with more precision. It would be useful to see that somewhere on https://macrostrat.org/sift/#/interval/275 (which doesn't fully load for me).

dperriguey commented 6 years ago

@dustymc what is https://macrostrat.org/magic/Devonian? This link doesn't work.

dustymc commented 6 years ago

It's something I wish existed - I think we have to know chrono vs. strato to access the API. I'd like a way to say "here's a string (eg 'Devonian'), we have no idea how you've categorized it, just tell us everything you know about anything with that name."

dperriguey commented 6 years ago

Ok, here is Shanan's explanation:

Everything in the Sift (https://macrostrat.org/sift) web application is powered by the Macrostrat API (and Mapbox API). Similarly, everything in the Rockd mobile App is powered by the Macrostrat API (and Mapbox API).

This means that, in principle, you and your team could have written Sift (or a better application) and Rockd (or a better application).

To use the “Devonian” example:

Macrostrat knows about this as an interval in geologic: https://macrostrat.org/api/v2/defs/intervals?name=Devonian https://macrostrat.org/api/v2/defs/intervals?int_id=94 (same as above, but referring to the internal “id” of the interval name “Devonian”)

Macrostrat knows Devonian in the context of geological rock units that intersect it in time: https://macrostrat.org/api/v2/units?interval_name=Devonian&response=long https://macrostrat.org/api/v2/units?int_id=94 (same as above, but a condensed “short form” response and reference id of Devonian)

Macrostrat knows the spatial extents of those columns with rock units that intersect it in time: https://macrostrat.org/api/v2/columns?interval_name=Devonian&format=geojson_bare

“Devonian” is an interval that has many different projections. Time and timescales, rock and geography (and even paleogeography!). Hopefully this helps too? https://doi.org/10.1029/2018GC007467

I'm not sure if that helps the conversation. Please let me know. And I'm just getting a little lost in the API talk. I get what it is, but I am not really understanding our dilemma technically here.

We need control of our hierarchy/structure with the added Macrostrat component. We need to be able to supplement what we take from Macrostrat because it is not by any means complete. Macrostrat is a powerful tool though and if we can easily pull from it without having to communicate within our Arctos group every time we need a new lithologic unit that would be great. The only difference would be when the lithologic unit we need does not exist in Macrostrat. We would still have to communicate and make it part of our structure in those situations. It would benefit Shanon as well if he had access to our changes. He should be able to make the necessary changes to Macrostrat without the ability to make changes to our structure. Does that make sense?

dustymc commented 3 years ago

Merge-->https://github.com/ArctosDB/arctos/issues/2244