ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
60 stars 13 forks source link

Geology Attributes #1387

Closed dperriguey closed 6 years ago

dperriguey commented 6 years ago

Is there a way to upload more geologic attribute values? We have a few groups, formations, and members that are not available through the data entry form. We are going to start bulkloading some of our data soon, but before we do, is it necessary to create these values, or will Arctos create them automatically?

dustymc commented 6 years ago

Geology attributes are "authority data," so there is no bulkloader. They must exist in the authority table before they can be used.

I could load them from CSV for you, enter them individually (if there aren't too many), or help get you started with the code table editor.

To load from CSV, I'd need

dperriguey commented 6 years ago

I have a CSV of a list of the values I'd like available, but there will be more; I do not have all our data compiled yet. Once they exist in the authority table, and when I start to bulkload our data, will those values be able to be associated with what we are uploading or do they have to be entered manually?

dperriguey commented 6 years ago

Looking at the code table and some of the specimens that are in Arctos leads me to think that some of the geologic attributes might be tied together in such a way that is not accurate. You cannot put all lithostratographic units (group, formation, member etc.) into simple chronostratographic classifications (Period, Epoch, Age etc.), (many yes, but others span multiple lithostratographic units into one chronostratographic classification or visa versa). For example: a specimen shows that it was found from the Mancos Shale Formation and it was not determined that it was actually from the Turonian Age, but the person who found it knows that the particular outcrop only spans from the Cenomanian-Turonian Ages. Would it be acceptable to put 2 Ages in the Geologic Attributes with their respective values or do you have another method for this?

dustymc commented 6 years ago

CSV

Send them along whenever you're ready.

The specimen bulkloader will handle a few (6?) geology determinations, and as many as you need can be entered directly to localities. The bulkloader will reject anything not in the authorities or marked as not usable.

You only need to enter the most precise determination(s) - search considers the hierarchy.

not accurate

You'll need to work out any data problems with other users. I'm happy to facilitate that however I can, and I can address any technical issues.

The structure is purely hierarchical, so a term may have one (or zero) parents. A-->(B and/or C)-->D data should probably be entered in the authority table as A-->D, skipping the ambiguous stuff (or maybe just A-->{no parent} would be better). A locality (=specimens) could have three determinations: A, B, and C.

Let's skype if I've completely misunderstood your concerns - I'm not a geologist and I'm never sure if I'm actually understanding these data.

And @KatherineLAnderson @Jegelewicz do ya'll have any documentation or how-tos which could be added to the Handbook? http://handbook.arctosdb.org/documentation/geology.html is a little sparse....

dperriguey commented 6 years ago

CSV I've attached my file

Chrono-Litho Upload 1.csv.xlsx

I'm hoping that the file will help make my issues with accuracy a little more clear. Any of the headers that say "...Combination" means that the group or formation in the following column spans that amount of time. Now when compiling data from old specimens (where the person that found the specimen is no longer living), the geology is not always spanning the chronostratigraphy like I have in the file. Sometimes it is specific and other times it is less. When it is less specific, I need to use a chronostratigraphic determination that spans several Ages, for example. When you look at the file you'll see under the formation column the Kiowa formation. The Stage/Age for that formation is Valanginian - Cenomanian. The data that I will be uploading will be more specific to the Albian Age that is within that span of time (Valanginian - Cenomanian). In the future there may be a specimen that is less specific or specific to the Cenomanian Age. So, I would need to be able to work with the data entry in that way. Does that make sense? I can modify the file so instead of the columns showing a span, like Valanginian - Cenomanian, it will have a column for all the Ages that the formation spans.

I think I understand your point about the data being hierarchical. Some formations will definitely not have a group associated with them, so you would be skipping some parent. It is important for users and our data records to have specimens as specifically determined as possible. I'm not sure how much this complicates things.

dustymc commented 6 years ago

Yes, thank you, data always helps. FYI you can attach CSV by ZIPping/compressing it. Excel is fine for now.

I picked a random one and https://en.wikipedia.org/wiki/Ripley_Formation says "deposited during the Maastrichtian stage." Your data put the Ripley Formation in Campanian-Maastrichtian. Why? (E.g., there's uncertainly about the definition of the Ripley or the determination means "definitely one of those things, probably that formation" or ???????????)

dperriguey commented 6 years ago

Because Wikipedia is wrong check out this website by a geologist https://macrostrat.org/sift/#/strat_name_concept/3573

Just based on my experience with this formation, it is probably due to most of the accessibility of outcrops being the uppermost sections of the Ripley Formation, which are Maastrichtian in age, or with most people's affinity for the K/Pg mass extinction where outcrops may only have the uppermost of the Ripley exposed in association with the last parts of the Selma Group (ie. Prairie Bluff Chalk Formation). I'm sure there is uncertainty, but the fossils are usually going to be the reasons for age determinations in sedimentary rocks. When a radiometric age date can be found in a formation (ie. and ash bed) then associations with rare fossils has higher resolution. The uncertainties are going to be group, formation, and member dependent. Certain sections of the Ripley may only span the lowermost Maastrichtian, but others may contain more time, for example.

dustymc commented 6 years ago

That's what I needed. I haven't been clear if this is a taxonomy or identification problem - looks like it's taxonomy.

Here are the current rules:

So from that we could:

1) Skip the not-so-nice bits altogether.

If you want users searching "Campanian" to find specimens under ^^ that determination, you'd have to add another "Campanian" determination to those specimens. (And you could do so for all 300 things Ripley spans, but that's obviously clunky.) This seems to be the path https://macrostrat.org/sift/#/strat_name/1742 takes.

2) Terms are just strings, I don't much care what's in them, so...

That would lead to three Stage/Age values

and no ability to say "Ripley-->Campanian" (Ripley can exist only once and have only one parent)

3) Create a term containing all the important strings

That would string-match "Ripley" and "Campanian" but would have nothing to do with the data objects "Ripley" or "Campanian" or "Maastrichtian."

4) Get a data model which better reflects the data. Linnean taxonomy is not hierarchical (although one can pretend at limited scopes), so we don't store it in a hierarchical model. These data may be similar. This would possibly require dedicated funding, and managing sometimes-sorta-hierarchical data is more complex.

Our approach to Linnean taxonomy is described in http://handbook.arctosdb.org/documentation/taxonomy.html

dperriguey commented 6 years ago

I like number 1. Could we maybe add an option so someone entering data can select for spanning several time periods=To? (ie. Ripley Formation (formation)-->Stage/Age>Campanian To Stage/Age>Maastrichtian) Looking at the data entry form, it seems like if one wanted to put a formation in the wrong time, they could, is this correct? (ie. Eon: Archaean, Stage/Age: Turonian, formation: Fairbanks Basalt) It doesn't look like there are any controls on this. That the structure of the code table does not prevent a person from putting in wrong information? If this is the case, can you just categorize the formations and members I sent you higher up in Series/Epoch of the code table, or does it have to have a parent? (ie. Ripley Formation (formation)->Upper Cretaceous (Series/Epoch))

dustymc commented 6 years ago

Ripley Formation (formation)-->Stage/Age>Campanian To Stage/Age>Maastrichtian

The data structure cannot currently support that. There are exactly two possibilities going "up" from a term:

1) Ripley Formation (or any other term) does not have a parent 2) Ripley Formation has exactly one parent

I didn't side-scroll far enough: avoiding the ambiguity for Ripley Formation seems to involve placing it under Upper Cretaceous.

From the "taxonomy" side of things - the data in the code table - an explicit assertion of "Fairbanks Basalt" is also an implicit assertion of Eocene, Paleogene, etc. The only way to avoid those implicit assertions would be to remove the parentage all together.

Any number of "identifications" may be applied to a locality (and thereby the specimens it holds), and there is by design no relationship between them.

is acceptable. That could be "wrong" (for lots of reasons, and that doesn't necessarily make it unimportant - mis-transcribed identifiers are far from uncommon) or a test of OTHER-METHOD (at least it's consistent!) or a valid second opinion or WHATEVER. And hopefully the source of those apparently-conflicting data are documented in their method or remarks.

If that still doesn't do what you need, there is a possible alternative: use specimen attributes and avoid geology (as metadata of localities) altogether. It is fairly trivial to create new attributes, and it/they could be free-text, controlled/categorical, or one of each. (See for example "numeric age" [number + controlled units] vs. "age" [free text] vs. "age class" [categorical]). Specimen Attributes are independent assertions and do not come with built-in implicit assertions of the hierarchy. You can still have any number of them, so the example above (minus the implicit "Tonnie Siltstone Member is a child of the Chinitna Formation" assertion) is possible. The functional difference is in how the data are organized: Given a locality holding 20,000 specimens you can add or correct geology data to all specimens by altering the locality, where with specimen attributes you would have to edit each of the 20K specimens (which doesn't mean 20K clicks - there are bulk tools). (The code table for geology attributes is a further normalization - changing stuff there can change all 20K specimens for each of the 20K localities using the geology data.)

dperriguey commented 6 years ago

Ripley Formation (formation)-->Stage/Age>Campanian To Stage/Age>Maastrichtian

This makes sense from the current Arctos taxonomy perspective. The issue however is that ambiguity of the lithostratigraphy (group, formation, member) that you pointed out. I have a formation that spans both the Lower and Upper Cretaceous. I understand that it would take funding etc., but the grouping of chronostratigraphy and lithostratigraphy into one hierarchy is just going to complicate things in the future. I like both the keeping of different identifications (I find incorrect identifications all the time, and yes methods change) and the alternative of using specimen attributes (as long as it is still able to be accessed through the search function).

dperriguey commented 6 years ago

If using specimen attributes is a good alternative that can be easily modified or moved into the normalization process in the future, I'm all for it.

Jegelewicz commented 6 years ago

And @KatherineLAnderson @Jegelewicz do ya'll have any documentation or how-tos which could be added to the Handbook? http://handbook.arctosdb.org/documentation/geology.html is a little sparse....

Sorry to just be getting here. I don't really have much input and unfortunately the person at UTEP who might is currently recovering from a stroke. This seems like a complicated issue that might require a committee in the AWG to work through and perhaps some ideas for where we could get funding to develop something really good. @dustymc do you want me to add this to the AWG meeting agenda?

campmlc commented 6 years ago

I agree we should add to the agenda for Thursday's meeting. Dustin, I believe Cori is attending?

On Jan 9, 2018 4:23 PM, "Teresa Mayfield" notifications@github.com wrote:

And @KatherineLAnderson https://github.com/katherinelanderson @Jegelewicz https://github.com/jegelewicz do ya'll have any documentation or how-tos which could be added to the Handbook? http://handbook.arctosdb.org/documentation/geology.html is a little sparse....

Sorry to just be getting here. I don't really have much input and unfortunately the person at UTEP who might is currently recovering from a stroke. This seems like a complicated issue that might require a committee in the AWG to work through and perhaps some ideas for where we could get funding to develop something really good. @dustymc https://github.com/dustymc do you want me to add this to the AWG meeting agenda?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1387#issuecomment-356447137, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hLHsyJJpewZOm-Tq1vbyQkgHl9Zsks5tI_TdgaJpZM4RUx57 .

dperriguey commented 6 years ago

@campmlc I'm not sure, but I will ask her today.

dperriguey commented 6 years ago

@campmlc Did she say she would?

campmlc commented 6 years ago

Yes, she agreed to join the group but I haven't heard confirmation from her about tomorrow.

On Wed, Jan 10, 2018 at 2:55 PM, dperriguey notifications@github.com wrote:

@campmlc https://github.com/campmlc Did she say she would?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1387#issuecomment-356749692, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hIPUxATlS4R9OV_MUB0Z3mfVbqxtks5tJTG8gaJpZM4RUx57 .

dperriguey commented 6 years ago

We're going to try to make it. I have another meeting I'm at until 12, but I should be available shortly after that.

dustymc commented 6 years ago

chronostratigraphy and lithostratigraphy

That might just be a re-arranging of the authority data and fairly straightforward. I don't really understand the data in enough depth to know.

It would be very useful for me to get all the folks who use these data in one place at one time - Skype or similar might be sufficient. The input of a special-interest AWG committee would be most appreciated as well/in addition to that. Yes, let's get this started at the next AWG meeting if possible.

Certainly nothing will change our ability to maintain history/assert alternative interpretations.

specimen attributes

To set that up I need:

I think it's safe (assuming controlled vocabulary) to do this as one attribute ("geology term" or whatever you want to call it) with code table values like "Phanerozoic (Eon/Eonothem)" - that should be a bit simpler to use than an attribute for each "term type," and as long as the format is predictable (rank at the end in parens) it should be easy to extract the data later. I'm completely open to other ideas.

I think the idea would be to stash these data somewhere safe until there's a more appropriate place for them, and the intent would be to move them back to localities when we can. In the meantime, specimen attributes are indeed searchable.

KatherineLAnderson commented 6 years ago

I'm sorry for jumping on here late. This was a subject that Dusty and I had briefly discussed at some point in time, with the result being that I entered lithostratigraphic units with no parents for all the reasons discussed above. UAMES is essentially treating the chrono and litho data as separate data when in reality they are not separate, but also not neatly hierarchical. I am interested in how they might fit into a taxonomy, but I will need to look into the Arctos documentation.

When I "create"/enter a new lithostratigraphic unit for data entry I verify its validity, spelling, etc with GeoLex.

If there are more than one specific chronostratigraphic units (ie: Campanian to Maastrichtian) then we add them as 2 "Stage/Age" entries, so if a user searches either Campanian or Maastrichtian then Arctos will return the specimen. This gets messier if the span of time is longer, but it works with the existing infrastructure.

dperriguey commented 6 years ago

I might just be doing something wrong, but whenever I search for anything that should be linked to geologic attributes in the Stage/Age level, I get no results. I have to search at the Period level. And when I get results at the Period level they are still not the data linked to geologic attributes, they are linked to Specific Locality. I put a specimen in that is from and linked to the Maastrichtian; doesn't show up. I can find it under our institution or the genus though.

Also If I try to download data, even though geologic attributes are checked, results in an empty cell but does have the geologic attributes header.

If searching for geologic attributes does work and I'm just doing it wrong, when a user searches for Turonian, @KatherineLAnderson does a speciemen that is linked from Cenomanian to Maastrichtian show up? Looks like your point about "messier," means that this might not work.

I think of chronostratigraphy as an abstract thing; time. Lithostratigraphy is tangible and can be linked to chronostratigraphic units by the geologist, but through this there is much ambiguity. There is ambiguity in both, and subjectivity to a large extent. I think to avoid this, it is a good idea to keep them separated. They become linked by the specimen.

dustymc commented 6 years ago

Did you search within a couple minutes of updating? It might have been cached.

screen shot 2018-01-10 at 7 52 35 pm screen shot 2018-01-10 at 7 53 52 pm screen shot 2018-01-10 at 7 54 07 pm

all find http://arctos.database.museum/guid/UNM:ES:14793

I gave the geology concatenator a kick; results and downloads should be working properly now.

The only thing I really care about in the authority data is that there are unique terms, optionally with parents. If ya'll can agree on something and want to partition them in some other way, that's no problem and I'm glad to facilitate that however I can.

data linked to geologic attributes, they are linked to Specific Locality

I'm not sure what this means. The data structure is specimen-->specimen_event-->collecting_event-->locality/geology. An example might be useful if that doesn't clarify anything.

"Turonian" currently isn't used. "Cenomanian" and "Maastrichtian" are siblings of Turonian in the hierarchy; there's no cross-referencing there.

http://arctos.database.museum/SpecimenResults.cfm?geology_attribute_value=Prince%20Creek%20Formation&guid=UAM%3AES%3A12421 works (if you're logged out) because

screen shot 2018-01-10 at 8 17 46 pm

http://arctos.database.museum/SpecimenResults.cfm?geology_attribute_value=Mesozoic&guid=UAM%3AES%3A12421&geology_hierarchies=1 works because

screen shot 2018-01-10 at 8 20 03 pm

despite no explicit "Mesozoic" determination.

dperriguey commented 6 years ago

@dustymc I'm definitely doing something wrong then. I still cannot find it. That is our specimen. I don't have the same type of page you have in your screenshots.

dperriguey commented 6 years ago

Wait, I just found it. I have to go in and specifically put my search into the geology attribute value

dperriguey commented 6 years ago

data linked to geologic attributes, they are linked to Specific Locality

The issue I had with this was that I searched under Any Geographic Element and turned up with no results.

campmlc commented 6 years ago

Dusty, What fields are searchable through Any Geographic Element? If we don't include all fields, the term is misleading. We had the same issue with Locality Remarks, I believe.

On Thu, Jan 11, 2018 at 7:50 AM, dperriguey notifications@github.com wrote:

data linked to geologic attributes, they are linked to Specific Locality

The issue I had with this was that I searched under Any Geographic Element and turned up with no results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1387#issuecomment-356956523, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hJEuhrugdmywCYLRBRVlzQKXvBaxks5tJh_SgaJpZM4RUx57 .

dustymc commented 6 years ago

https://github.com/ArctosDB/arctos/issues/1397

The 'any...' fields are aimed more at public users/general exploration rather than curatorial users asking specific questions. I'm up for better labels - none of them will ever include everyone's idea of "all."

Locality remarks should be metadata - if it includes relevant place-name data, you should consider reexamining how you're organizing your data.

campmlc commented 6 years ago

Maybe "Quick Search" rather than "Any Geographic Element"? With disclaimer to "Use exanded search options for more detail"?

Remarks are used as last resort when we don't have relevant place-name data, like "drainage" :) - but I think that's fixed now.

On Thu, Jan 11, 2018 at 9:07 AM, dustymc notifications@github.com wrote:

1397 https://github.com/ArctosDB/arctos/issues/1397

The 'any...' fields are aimed more at public users/general exploration rather than curatorial users asking specific questions. I'm up for better labels - none of them will ever include everyone's idea of "all."

Locality remarks should be metadata - if it includes relevant place-name data, you should consider reexamining how you're organizing your data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ArctosDB/arctos/issues/1387#issuecomment-356975046, or mute the thread https://github.com/notifications/unsubscribe-auth/AOH0hOqvyXTRz8fhk5EjdXT_IBfQu6bgks5tJjHbgaJpZM4RUx57 .

dustymc commented 6 years ago

https://github.com/ArctosDB/arctos/issues/1398

Verbatim locality is a useful catch-all for locality terms which don't fit elsewhere, even if they're not very verbatim.

KatherineLAnderson commented 6 years ago

@dperriguey If the specimen is Cenomanian to Maastrichtian we would need to enter every stage/age separately. If we do that, then yes it would return if search for Turonian. So yes, very messy. We would only do this if it is explicitly stated in the data that was the range. Typically our legacy data has a formation and a system/period or series so that is what we input, and only stage/age if it is explicitly stated (ie: not assumed from formation).

dperriguey commented 6 years ago

@campmlc I like the idea of a Quick Search, and my ideas about this will relate to my next comment to Katherine

@KatherineLAnderson @dustymc I definitely understand the practicality from our perspective about not making assumptions, but most formations have been delimited to a range. I am trying to think about things from a novice user of data perspective, which I actually am. If we are not utilizing the aspects of why a formation was determined, then we would be limiting the reach of the data itself. By not linking the formation to some stage/age or range, then a user may not be able to access that data that could be useful to their research. Maybe there could be a verified/not verified check for geologic attributes. But my point is related to a bigger one. If we could have separate code tables, with picklist values (controlled vocab etc.), and this verified/not verified check box (plus your picture of the legacy specimen card) of both chrono and litho data, then you make the specimen accessible through time and geographic space. We should, in the future, be able to simply enter that a specimen came from the X Formation, and a user be able to simply search for X Geologic Period/Age and find your specimen. Likewise, if the user searches for a specific Stage/Age, they should be able to find your specimen if it possibly comes from that time due to the search falling within that range (being limited by say the formation you assigned it to). Certainly, the user should have a choice of this level of specificity. I can't express how often I find a specimen that says that it came from a time period that is so unlikely that I cannot bare to leave it incorrect. I'm sure that our collections could be improved by the larger scientific community, but without access, that lolly incorrectly identified specimen will remain as such until you make some grad student open drawers.

dustymc commented 6 years ago

I'm going to focus on authority data for now. We can re-visit determinations later if necessary.

separate code tables

Do you mean for chrono and litho? I think those data can be un-combined and still fit in the current structure, even if it means we have both "Albian Age" and "Albian Stage." If that's not what you're talking about, or if I've misunderstood, an example would be useful.

Given...

screen shot 2018-01-11 at 7 26 49 pm

SOMEWHERE out there should be a publication (or likely series of them) which says roughly, "Cape Deceit Formation is a subset of Calabrian (Stage/Age) is a subset of Pleistocene (Series/Epoch) is a subset of Quaternary (System/Period) is a subset of Cenozoic (Erathem/Era) is a subset of Phanerozoic (Eon/Eonothem)." If that's true our authority data are correct, and if it's not our authority data are incorrect.

Does that sound reasonable, and are our authority data defensible under those criteria?

dperriguey commented 6 years ago

Separate code tables

Yes, I do mean separate chrono and litho. Stage and Age are the same thing. It is just a difference of whether you live in Europe or USA, or if you're part of younger generations then they get used interchangeably.

So chronostratigraphy is time http://www.stratigraphy.org/ICSchart/ChronostratChart2017-02.jpg This changes so rarely and ever so slightly now that it's hard to mess this one up.

Lithostratigraphy is rock, which was deposited during a time in the past. This is determined a number of ways. There is a hierarchy with this data too, but its more complicated. In the most simple way it goes from all inclusive to less: group->formation->member->unit All this is, is how different geologists split these up. These get a revised a little more often than chronostratigraphy. The other complicating aspect of this kind of data is that in physical space say one group will be connected to another group. The only differences sometimes between these touching groups are the geologists that determined them, that they may be the exact same rock and deposited during the same time (this is an extreme case), or more frequently there can be different formations that span the "same" amount of time adjacent to one another that are still part of the same group (and depending on who you talk to, they might be called different formations or the same formations) and different rock types.

This is why it makes it so problematic to have chronostratigraphy and lithostratigraphy part of the same hierarchy. Just like splitters and lumpers in biologic taxonomy, so are there in lithostratigraphy. They can be linked, eg. my specimen came from the Prairie Bluff Chalk Formation and is from the Maastrichtian Stage/Age, but together it gets over generalized.

cemyers42 commented 6 years ago

I'm super new here, so i hope this helps...! Dustin is correct. Lithostratigraphy and chronostratigraphy are totally different things; you would never list a geologic formation as the basal unit of a chronostratigraphic hierarchy. The issue is that altho the chronostratigraphy is unlikely to change much, the age of a given Formation may change substantially as folks continue to study it's fauna and geochemistry. For example, the Mancos Shale may historically have been considered to span the entire Late Cretaceous (Cenomanian-Maastrichtian) in time, but improved radiometric dating may support that it's actually only Cenomanian-Campanian in time. As Dustin mentioned, the name may not change, but the age range is MUCH more volatile.

There is also an issue of a given formation spanning different parts of its age range in different places - e.g., NM may only have Mancos Shale from the Cenomanian-Turonian stage/age even tho the formation ranges into the Campanian in its full distribution. I'm not sure if this is an issue really or not as long as specimens listed as "Mancos Shale" from NM aren't automatically listed for the full range of ages of the formation....(i want to say that other databases deal with this by not linking formations to chronostrat ages unless they are specifically known...?).

In thinking about fully separating time from rocks, Dustin's comment about the naming of rocks is also very important. What we call the Mancos Shale formation in NM is age-equivalent to much larger set of formations (Dakota Sandstone, Graneros Shale, Carlile Shale, Niobrara Chalk, and Pierre Shale) in KS. Many of these rocks may be of the same rock composition as the NM Mancos Shale (e.g., Graneros, Carlile, Pierre), but have different names because they were named by state geologists and are specific to that region. This makes lithostratigraphy complicated, but should not make chronostratigraphy complicated because you figure out the chronostrat of a formation, not vice versa.

Two databases that might be helpful as examples of how some folks handle this stuff are the Paleobiology Database (https://paleobiodb.org) and Macrostrat (https://macrostrat.org/). The former is specimen-based and the latter is lithostratigraphy-based. However, both acquire data from the published literature (not museum collections).

dustymc commented 6 years ago

So...

Cape Deceit Formation is a subset of Calabrian (Stage/Age) is a subset of Pleistocene (Series/Epoch) is a subset of Quaternary (System/Period) is a subset of Cenozoic (Erathem/Era) is a subset of Phanerozoic (Eon/Eonothem)

Becomes

Calabrian (Stage/Age) is a subset of Pleistocene (Series/Epoch) is a subset of Quaternary (System/Period) is a subset of Cenozoic (Erathem/Era) is a subset of Phanerozoic (Eon/Eonothem)

and

Cape Deceit Formation

and yay everybody - right?!?

KatherineLAnderson commented 6 years ago

"Yay" from UAMES. I agree with what others have written regarding the volatility of lithostratigraphy. This is very problematic in Alaska, and would require quite a bit of management if we wanted to input a taxonomy of some sort into Arctos.

(As a side, I think the original hierarchy that associated the lithostratigraphic units and chronostratigraphic units was entered 2-3 collection managers/curatorial assistants ago when UAMES first started using Arctos. Apologies on our department's end for potentially seeding this issue in the first place.)

dustymc commented 6 years ago

Apologies

I don't think that's necessary - from what I see design seldom/never gets it quite right, and it takes a bit of selective pressure to polish pretty much everything.

@cemyers42 @dperriguey @Jegelewicz does this work for you as well?

If so, can I just set parent NULL for all "formation" data?

If that all works, should I try to bring any of the existing chrono data into those specimens? Assuming we have a solution and going forward, asserting "Cape Deceit Formation" (which might eventually have some lithographic parentage) and "Calabrian" will require two geology attributes - I could magic the second in while updating the authorities if that's desirable.

Jegelewicz commented 6 years ago

Yes to the above and thanks so much to @cemyers42 for the awesome summary!!!

KatherineLAnderson commented 6 years ago

@dustymc I am dubious about adding the chronostratigraphic units to all specimens assigned to formations that are currently in the hierarchy. It could create issues we are trying to fix by removing this.

I think most (all?) of these formations are in Alaska. Let me check on the accuracy of the existing relationships in the hierarchy and I can provide a list of which formations would be okay to magically add the chronostrat determination to existing records associated with lithostrat units in the hierarchy.

The formations that don't make the list should not get the chronostrat determinations auto-added to records. I will need to manually check data associated with the specimens assigned to the formations. This would apply to (for example) the Cape Deceit formation, which has a broader chronostratigraphic range than just the Calabrian (although all specimens in our collection that are assigned to the Cape Deceit formation might be of Calabrian age... but I don't know that yet). I can bulkload chronostrat determinations at a later date to specimens belonging to X locality from Y formation if needed, right?

dustymc commented 6 years ago

@KatherineLAnderson I'm just looking for a path that won't muck up your data. Doing nothing is certainly easiest for me! I don't know if there's currently a bulkloader or not, but there could be, or I can help you SQL stuff back in as needed.

cemyers42 commented 6 years ago

@dustymc this plan sounds good to me. To make sure I'm clear - what you are saying is to have a chronostratigraphic hierarchy set of Geology Attributes and a non-hierarchcal lithostrat set of Geology Atttributes that are NOT linked to chronostrat - correct? If so - perfect!

Would it be possible to make the lithostrat Geology Attributes menu writable? i.e., because of all the naming funny business, i'm not sure there is a good reference for names of all series/groups/formations/members/etc that are currently in existence and accepted by the community...if we could write in the Formations and Members associated with our data, then we could build such a list over time that is linked to specimens (i can see advantages in doing it this way :) ).

Also, i'm attaching the most up-to-date International Chronostratigraphic Chart. You can use these chronostrat designations for the drop-down menu. There may be some small challenge because historically vert paleo used their own North American Land Mammal Ages chronostrat chart...but perhaps we can figure a way to either include those chronostrat names independently (at which point there would be overlap with the ICC names) or each collection could try to correlate their records to the ICC before uploading to Arctos....i don't have strong feelings about which would be better in this case.

Another thing to be thinking about is numerical age. chronostratigraphy is in relative time (this thing is older or younger than this other thing), but is not directly linked to a numerical age (e.g., 65.7 mill years old +/- 0.6 mill yrs). The ICC does have dates associated with it, these are typically based on radiometric ages and get modified annually, altho at this point most of the modifications are fairly minor (e.g., reduced error estimates or numerical age changes in the decimal places). My initial thought would be to have Numerical Age as a separate Geology Attributes bin that could be linked to chronostrat and lithostrat if that information is known...would that provide the best route to easy modification as numerical ages are updated?

Aside on terminology: Ga = billion years ago, Ma = million years ago, Ka = thousand years ago VERSUS Gyrs = billions of years, Myrs = millions of years, Kyrs = millions of years (but not necessarily in the past, as when you might talk about how long something takes).

ChronostratChart2017-02.pdf

dustymc commented 6 years ago

have a chronostratigraphic hierarchy set of Geology Attributes and a non-hierarchcal lithostrat set of Geology Atttributes that are NOT linked to chronostra

Yes, I think?? I'm saying "build hierarchies that are useful to you." Hierarchies may have a depth of 1, which I think correlates to "non-hierarchcal lithostrat set of Geology Atttributes that are NOT linked to chronostrat." (Nothing stops you from adding "superformation" or "subformation" or etc. hierarchical data to those bare "formation" terms at a later time if something like that becomes useful.)

I think for right now that just means remove the parentage from all "formation" terms (https://arctos.database.museum/info/ctDocumentation.cfm?table=CTGEOLOGY_ATTRIBUTE), and I can do that if you'll confirm that's where we want to go first.

make the lithostrat Geology Attributes menu writable

Not while also making it usefully searchable. There is no chance that the data entry folks will consistently type "Pleistocene (Series/Epoch)" and inconsistent data cannot be usefully searched. We could add a free-text specimen attribute into which you could type whatever you want, and perhaps periodically clean those data up and re-load them as formal geology attributes or similar, but I think the formal attribute determinations must be controlled-vocabulary. (And it's almost certainly less work in the long run to establish the formal authority data beforehand than to try to make sense of what's been typed afterwards.)

radiometric ages

If that's a determination, it's probably best entered as an attribute of a specimen (https://arctos.database.museum/info/ctDocumentation.cfm?table=CTATTRIBUTE_TYPE&coln=ES&field=radiometric%20date).

If that's an attribute of a term or similar (eg, Holocene=<0.0117 from your chart), especially one which changes frequently, perhaps we could plug into some webservice and just pull those data on demand.

Jegelewicz commented 6 years ago
make the lithostrat Geology Attributes menu writable

Not while also making it usefully searchable. There is no chance that the data entry folks will consistently type "Pleistocene (Series/Epoch)" and inconsistent data cannot be usefully searched. We could add a free-text specimen attribute into which you could type whatever you want, and perhaps periodically clean those data up and re-load them as formal geology attributes or similar, but I think the formal attribute determinations must be controlled-vocabulary. (And it's almost certainly less work in the long run to establish the formal authority data beforehand than to try to make sense of what's been typed afterwards.)

Yes! This needs to be a code table.

dperriguey commented 6 years ago

@dustymc in the mean time, can we work with the specimen attributes like we talked about? "I think it's safe (assuming controlled vocabulary) to do this as one attribute ("geology term" or whatever you want to call it) with code table values like "Phanerozoic (Eon/Eonothem)" - that should be a bit simpler to use than an attribute for each "term type," and as long as the format is predictable (rank at the end in parens) it should be easy to extract the data later. I'm completely open to other ideas.

I think the idea would be to stash these data somewhere safe until there's a more appropriate place for them, and the intent would be to move them back to localities when we can. In the meantime, specimen attributes are indeed searchable."

I would not need it for chronostrat stuff: Eon>Era>Period>Epoch>Age. Those values in the current geology attributes code table seem to be fine and I feel ok about storing our data there. But the lithostrat stuff: group>formation>member>unit>bed I would be entering these into the specimen attributes for storage until we resolve this issue, adding the rank at the end in parens. I would need the value to be writable like that of unformatted measurements. Is this something we can do?

dustymc commented 6 years ago

until we resolve this issue

What isn't resolved?

Jegelewicz commented 6 years ago

@dustymc @dperriguey I think that using an attribute is a good solution to allow data entry to proceed while the solution is in process. I suggest the attribute:

Lithostratigraphic unit - description of the body of rocks in which the object was found that is defined and recognized on the basis of its lithologic properties or combination of lithologic properties and stratigraphic relations.

Taken from https://engineering.purdue.edu/Stratigraphy/strat_guide/litho.html

Looking at this page, it seems like this might be a complicated fix, so using an attribute with a code table might be a way to start the process....

dustymc commented 6 years ago

I'm so lost...

The solution as I understand it - https://github.com/ArctosDB/arctos/issues/1387#issuecomment-357287007 - will take minutes to implement, I just need a go-ahead.

??????????

Jegelewicz commented 6 years ago

@dustymc We are adding this attribute to parts, correct? Or have I missed the part where this will become part of locality? It seems like this should be a locality attribute, but it also seems like it is quite complicated for the creation of a code table. (I could be completely wrong about that as I am not a geologist....) Perhaps I am just muddying the waters.

dperriguey commented 6 years ago

@dustymc does comment #1387 just mean that you would remove the lithostrat from the current geologic attribute hierarchy? @Jegelewicz Yes, this is a good page. And that's a fine name for the attribute.

dperriguey commented 6 years ago

@Jegelewicz we can simplify the information from the webpage you sited. In the future maybe modify it if necessary. For us, we just need simple values like group>formation>member>unit. This is as complex as the hierarchy is for our collections so far. It would be nice to get some other paleo people input to help @dustymc start. Things like "tongue" are just in the name of a member for example. And definitely, it should be a part of locality data.

Sorry @dustymc I was just getting confused by the conversation. There is a lot of new language I'm getting used to. Maybe we can talk on the phone so I can become more clear on what you are thinking. Would that be ok?