ArctosDB / arctos

Arctos is a museum collections management system
https://arctos.database.museum
61 stars 13 forks source link

A question about chronostratigraphy #1935

Closed Jegelewicz closed 3 years ago

Jegelewicz commented 5 years ago

@mbprondzinski @dperriguey you guys might be able to contribute to this conversation: https://github.com/tdwg/dwc-qa/issues/130

But it also made me wonder if we need a way to offer a range of chronostrata in our geology attributes. It is true that often there is a range given and apparently there are expectations in the paleo community for began and end.

mbprondzinski commented 5 years ago

Our geology collection data is rudimentary at best. However, we are getting a new volunteer who is a recent retiree from the Geological Survey of Alabama. I am going to try my darndest to enlist his expertise into preparing our collection for Arctos. He might be a valuable resource.

BTW, I will be out of town tomorrow, weather permitting, so I won’t be attending the Taxonomy meeting.

From: Teresa Mayfield-Meyer [mailto:notifications@github.com] Sent: Tuesday, February 19, 2019 3:57 PM To: ArctosDB/arctos Cc: Prondzinski, Mary Beth; Mention Subject: [ArctosDB/arctos] A question about chronostratigraphy (#1935)

@mbprondzinskihttps://github.com/mbprondzinski @dperrigueyhttps://github.com/dperriguey you guys might be able to contribute to this conversation: tdwg/dwc-qa#130https://github.com/tdwg/dwc-qa/issues/130

But it also made me wonder if we need a way to offer a range of chronostrata in our geology attributes. It is true that often there is a range given and apparently there are expectations in the paleo community for began and end.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ArctosDB/arctos/issues/1935, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ApNlkOP8lt2pmeJkHwym9LHg4GVCBRzoks5vPHMcgaJpZM4bD825.

dustymc commented 5 years ago

Arctos can deal with ranges (or most anything else) now.

Sorta unrelated, but I think the Chronostratigraphy ( ) hierarchical term is just wrong - we've got time data under the rock-header, and rock-data under time, and probably some other stuff. If we're going to offer series and epochs separately then that division works, but it looks like nonsense as we're offering them together. Where'd that come from??

Jegelewicz commented 5 years ago

Sorta unrelated, but I think the Chronostratigraphy ( ) hierarchical term is just wrong - we've got time data under the rock-header, and rock-data under time, and probably some other stuff. If we're going to offer series and epochs separately then that division works, but it looks like nonsense as we're offering them together. Where'd that come from??

I don't know where the original terms came from, but we definitely don't seem to have them correct.

Differences from chronostratigraphy It is important not to confuse geochronologic and chronostratigraphic units.[12] Geochronological units are periods of time, thus it is correct to say that Tyrannosaurus rex lived during the Late Cretaceous Epoch.[13] Chronostratigraphic units are geological material, so it is also correct to say that fossils of the genus Tyrannosaurus have been found in the Upper Cretaceous Series.[14] In the same way, it is entirely possible to go and visit an Upper Cretaceous Series deposit – such as the Hell Creek deposit where the Tyrannosaurus fossils were found – but it is naturally impossible to visit the Late Cretaceous Epoch as that is a period of time.

From - https://en.wikipedia.org/wiki/Trace_fossil_classification

Perhaps we need to change the terms to:

Segments of rock (strata) in chronostratigraphy Eonothem Erathem System Series Stage

And then do we need the Geochonology terms?

Time spans in geochronology Notes togeochronological units
Eon 4 total, half a billion years or m
Era 10 defined, several hundred million years
Period 22 defined, tens to ~one hundred million years
Epoch 34 defined, tens of millions of years
Age 99 defined, millions of years

Also see https://en.wikipedia.org/wiki/Chronostratigraphy

Chronostratigraphic units, with examples:[1]

eonothem – Phanerozoic
erathem – Paleozoic
system – Ordovician
series – Upper Ordovician
stage – Ashgill
dustymc commented 5 years ago

I think my problem is just the header. My possibly-incorrect understanding is that these things are used as "Pennsylvanian (Series/Epoch)" and forcing everybody to enter "Pennsylvanian (Series)" and "Pennsylvanian (Epoch)" is just more work for unexpected data.

Jegelewicz commented 5 years ago

OK, I'd be happy to make the heading be Chronostratigraphy/Geochronology, but we don't have the Attributes ordered the same for each case:

Eon/Eonothem is Geo/Chrono

and

Erathem/Era is Chrono/Geo

If we can change the attribute terms to be in a consistent order, we can make the header reflect that order.

Is that what you are after?

dperriguey commented 5 years ago

This is a problem I had when we were originally getting geology attributes to not be forced into each other. There is chronostratigraphy and there is lithostratigraphy. These are the types of stratigraphy relevant to our work. Chrono is time. Litho is rock.

The chronostratigraphic chart is determined by the international commission on stratigraphy (find there info at stratigraphy.org). This is an international group. The Erathem/Era, System/Period etc. is only there to make USA and Europe happy; they are just different words with the same meaning. We should stick to what the ICS concludes in their chronostratigraphic charts!
SEE: http://stratigraphy.org/ICSchart/ChronostratChart2018-08.jpg

The problem is that time is embedded in rock. We get the chronostratigraphic chart from the rock. Not all rock fits perfectly into the chronostratigraphic boxes. A lithostratigraphic Group will span a lot of time; multiple stages/ages, multiple series/epochs.

The only ranges needed are for the lithostratigraphic values (group, formation, member). I personally would like if we entered, for example a formation like Mancos Shale that it shows the range Cenomanian - Campanian (stage/age). We've been putting these in manually.

Sometimes I have a record that says that it came from the Late Cretaceous Paguate Sandstone. Great! They put both chronostratigraphy and lithostratigraphy. But what does the Paguate Sandstone cover? Oh, the Cenomanian. Cool. But our lithostratigraphy should say that when I put it in.

Sometimes I'm just given Early Cretaceous. Ok, cool I know that that is Lower Cretaceous and I'll enter it in as such.

Sometimes I have a record that says Albian Tucumcari Shale Formation. This is important because the Tucumcari Shale spans the Albian-Cenomanian. But the collector put Albian on the card. Great! Now time and rock are in happy harmony. Sometimes it might just say Tucumcari Shale. This spans both the Lower Cretaceous and the Upper Cretaceous! The person searching Arctos should be given this information at the resolution of time that the collector gave us.

I suggested being able to put a "-" in between to chronostratigraphic geology attributes a while back. I'd be happy with this too. This way we can put the range in ourselves. It would also be more cool if the ranges were embedded in lithostratigraphy, and be searchable by people. I should be able to search Santonian and records associated with Mancos Shale should pop up (because Santonian is in between the Cenomanian and Campanian)!

Jegelewicz commented 5 years ago

The only ranges needed are for the lithostratigraphic values (group, formation, member). I personally would like if we entered, for example a formation like Mancos Shale that it shows the range Cenomanian - Campanian (stage/age). We've been putting these in manually.

Manually how?

For this I would add 3 geology attributes:

Geology Attribute = Stage/Age Value = Cenomanian Determiner = Date = Method = Remark = beginning of range

Geology Attribute = Stage/Age Value = Campanian Determiner = Date = Method = Remark = end of range

Geology Attribute = formation Value = Mancos Shale Formation Determiner = Date = Method = Remark =

This is the reason I moved all of the litho attributes out of any parent/child relationships with values in the chrono table. Adding a litho and a chrono attribute should suffice to say it is from both.

What is lacking in the model is a way to provide a chrono range that provides something better that the "this or that" that is in the example above.

@dustymc any ideas on how we might get that in place?

Jegelewicz commented 5 years ago

I should be able to search Santonian and records associated with Mancos Shale should pop up (because Santonian is in between the Cenomanian and Campanian)!

But my understanding is that the lithostratigraphy isn't necessarily in any given chronostratigraphy depending upon where you happen to be located.

Lithostratigraphic units are only defined by lithic characteristics, and not by age.

With the example above, you would find anything that had been assigned both the litho and chrono when you searched either one as it would show up in a search of Mancos Shale OR any other chrono term you add. Maybe the answer is that you would need to add all possibly chrono terms in the "range"...seems like a pain.

Jegelewicz commented 5 years ago

Sorta unrelated, but I think the Chronostratigraphy ( ) hierarchical term is just wrong - we've got time data under the rock-header, and rock-data under time, and probably some other stuff. If we're going to offer series and epochs separately then that division works, but it looks like nonsense as we're offering them together. Where'd that come from??

I think I have cleaned this up - which term(s) are the problem?

dustymc commented 5 years ago
screen shot 2019-02-21 at 7 29 50 am

Series are subdivisions of rock layers....

My understanding is that those terms are "conveniently" used for both rocks and time. "Pleistocene Series" is rock, "Pleistocene Epoch" is time, we have both mashed into one term per tradition, but we now have it under the parent "time" for some reason.

Now there's a new top-level parent introducing a completely new THING. (From https://github.com/ArctosDB/arctos/issues/1900#issuecomment-465294862??) That looks more like a kingdom-level taxonomic term than an attribute of the place where a specimen was found at the moment.

Jegelewicz commented 5 years ago

My understanding is that those terms are "conveniently" used for both rocks and time. "Pleistocene Series" is rock, "Pleistocene Epoch" is time, we have both mashed into one term per tradition, but we now have it under the parent "time" for some reason.

Well, according to @dperriguey the terms are that way because US and European terms differ for the "rock". And as I said above, if they were in a consistent order, I would be happy to make the term Chronostratigraphy/Geochronology

Now there's a new top-level parent introducing a completely new THING. (From #1900 (comment)??) That looks more like a kingdom-level taxonomic term than an attribute of the place where a specimen was found at the moment.

The instructions state that we can add terms to organized the table - I merely did this so that I could easily see where the chrono stuff was as opposed to the Litho stuff.

The instructions say:

"Lithostratigraphy" might be a useful term as the start of a set of hierarchies, but it's not something that can have a meaning in Geology Attributes so should not be "valid."

and it isn't - I don't know what the problem is here. @dperriguey @mbprondzinski is this a problem for you? Should this be done differently?

dustymc commented 5 years ago

ordered the same for each case

UAM@ARCTOS> select distinct ATTRIBUTE from geology_attribute_hierarchy order by ATTRIBUTE;

ATTRIBUTE
------------------------------------------------------------------------------------------------------------------------

Eon/Eonothem
Erathem/Era
Series/Epoch
Stage/Age
System/Period
bed
block
field
formation
group
member
suite
unspecified

I can update whatever needs updated.

And why do we need "unspecified"??

I believe DWC introduced the idea of ranges (EarliestGeologyThing, LatestGeologyThing, ...) to the world, although it was almost certainly introduced to DWC by someone with that structure in their local DB. DWC is an exchange standard - it needs to be approachable by various sources. One way it accomplishes that is by having a simplified structure. Arctos has no such limitations.

I dislike ranges because they require a user to make assumptions. Given "A-D" I have to guess what's in the middle, guess if it was the same when those data were recorded as it is now, and I still can't search for "B," which just isn't recorded anywhere. Given "A" + "B" + "D" I can find anything and I don't have to make any guesses.

We could add some complexity in order to get less-useful data, but that doesn't seem like a useful direction to me.

If "A" is always contained within "B" then the authority data may be arranged hierarchically, and a search for "B" will find specimens for which only "A" was asserted. If "A" and "B" have some less-structured relationship then the authority can be entered independently and localities can be attached to neither, one, the other, or both, and searches will only match localities attached to the search term. That is, if hierarchical authorities are useful you can use them, and if they're not you don't have to; the code table can serve as a simple list of terms as well as it can serve as something more complex.

dustymc commented 5 years ago

Sorry, I wasn't clear. The top-level parent I was referring to is "Petrology ( )." It's not clear to me if that's the same sort of thing as chrono/litho or something entirely different.

I'm still fine with 'Lithostratigraphy' as a top-level term, just not if we have time-data buried in it. My knowledge isn't deep enough to be sure if that's what we have or not; I think https://en.wikipedia.org/wiki/Series_(stratigraphy) suggests rocks but I could be misreading that or it could be wrong.

Jegelewicz commented 5 years ago

I can update whatever needs updated.

Terms should be this:

Chronostratigraphy/Geochronology ( )

Eonothem/Eon Erathem/Era System/Period Series/Epoch Stage/Age

And why do we need "unspecified"??

No idea

I dislike ranges because they require a user to make assumptions. Given "A-D" I have to guess what's in the middle, guess if it was the same when those data were recorded as it is now, and I still can't search for "B," which just isn't recorded anywhere. Given "A" + "B" + "D" I can find anything and I don't have to make any guesses.

Thus my solution to add all possible strata in the "range". Which works within Arctos, but not when data is pushed to the aggregators. Perhaps we need a way to mark 1st and last for that purpose?

if hierarchical authorities are useful you can use them, and if they're not you don't have to; the code table can serve as a simple list of terms as well as it can serve as something more complex.

So I have attempted, to the best of my limited geology knowledge, to use a hierarchy and now EVERYONE is stuck with it. To me it seems useful, but perhaps not everyone does and it can easily be undone (OK, not really easily - it takes a long time to change relationships one by one!), but it is possible and somewhere down the line, what I thought I was adding to my locality may no longer be there....probably not the best thing?

dperriguey commented 5 years ago

@Jegelewicz Yes, lithostratigraphic units are defined by their lithology, but they also were deposited during some interval of time. You made the same point with your example. I enter these exactly like you did above ("manually") but it keeps Santonian out of the search unless I enter all of them. We shouldn't need to do that. The reasoning for including the interval of time a lithostratigraphic unit spans is of the same logic as noting the time, date, and location of collecting some modern biological data. The difference is that I collected a fossil on some date, at some time, in some place but that fossil was deposited in a totally different context (think 4D). I find a specimen that only says Mancos Shale associated with it, as a geologist I know that it came from the Late Cretaceous. If a search for Santonian will find that record, then I'm fine, but we need to be able to increase the time resolution when necessary (like we currently have in place, but does not satisfy the point below).

@dustymc @Jegelewicz As far as I know, when I search Santonian I find no data associated with Mancos Shale (bums me out).

Our data will occasionally be at the resolution of series, stage, formation, and/or member. Lots more times it is not, but could be if someone wants to work to increase the resolution on that record in the future. This happens when someone wants to use a collection that was published on and finds out new information about the locality or is doing an overhaul of the analysis.

dustymc commented 5 years ago

pushed to the aggregators

We can always talk about that, and we don't have to limit our model in doing so.

what I thought I was adding to my locality may no longer be there....probably not the best thing

Yes agreed. Very few of us should have access to the code tables. Let's agree to ask users before changing hierarchies and document that in http://handbook.arctosdb.org/how_to/How-to-Use-Code-Tables.html

I find a specimen that only says Mancos Shale associated with it, as a geologist I know that it came from the Late Cretaceous.

That's what the implicit assertion in a hierarchy is intended to address. I shouldn't have to search everything that anyone's ever thought was "Late Cretaceous" to find Late Cretaceous stuff. It's an implied assertion which exists for searching, only "Mancos Shale" is a hard assertion - I can filter that after I've found everything that MIGHT BE Late Cretaceous much, much easier than I can guess at all the stuff I think you might have thought was Late Cretaceous. I think we're making access overly difficult for users by breaking hierarchies apart, but my limited knowledge won't let me defend that position very vigorously.

search for Santonian will find that [Mancos Shale] record,

That's what the hierarchies were intended to facilitate, but we broke them apart.

Mancos Shale will find...

screen shot 2019-02-21 at 10 22 07 am
Jegelewicz commented 5 years ago

I enter these exactly like you did above ("manually") but it keeps Santonian out of the search unless I enter all of them. We shouldn't need to do that. The reasoning for including the interval of time a lithostratigraphic unit spans is of the same logic as noting the time, date, and location of collecting some modern biological data.

We could solve this by allowing multiple, duplicate lithostrata terms (one under each period of time it spans) or by creating something new.

When I consider this, it seems to me that we should have chrono and litho in two separate places. Chrono is a defined hierarchy where we could be explicit about the order of things and putting a from XXX to YYY would encompass everything in between. We can even add a trigger that would not allow XXX to be AFTER YYY. Assigning that range (sorry Dusty) along with the lithostrata would be everything you needed. If we think of the Chronostratigraphy as time (which it is) then this is no different than assigning a began and ended date as long as we have the Chrono set up in an order that makes sense.

Correct?

Jegelewicz commented 5 years ago

That's what the hierarchies were intended to facilitate, but we broke them apart.

Because I am not allowed to put "Mancos Shale Group" under the five different Chrono levels it belongs under.

dustymc commented 5 years ago

five different Chrono levels

Ah... Despite my visceral reaction to duplicates anywhere, we could drop the unique constraint

Mancos Shale Group-->thing1 Mancos Shale Group-->thing2 Mancos Shale Grupp-->here's the problem, this would require a high level of paranoia...

"Something new" is probably better, but maybe we can work our way into that slowly, unless someone has better ideas.

chrono and litho in two separate places

That requires some work and expertise. The label says "Mancos Shale Group." That's all you know, so you don't enter "thing1." Users search thing1 and expect "Mancos Shale Group," which requires something that we seem reluctant to even implicitly assert. I have no idea where the useful balance is, or if I'm completely understanding the problem.

no different than assigning a began and ended date

It is only in that we all(ish) agree on dates, and they don't(ish) change so we can predict intermediate values with a great deal of certainty. The Chart is re-issued all the time; A-->C may be A-->B--C in the next version as new "species" of stratigraphy are named. I'm not sure that's fatal, but it's something to keep in mind if we end up model-building.

Jegelewicz commented 5 years ago

"Something new" is probably better, but maybe we can work our way into that slowly, unless someone has better ideas.

Yes, because I am not a fan of creating all of those relationships!

What if we kept the table with began and ended "dates" instead of a hierarchy:

Erathem/Era = Mesozoic | Began = 251.902 Ma | Ended = 66 Ma System/Period = Cretaceous | Began = 145 Ma | Ended = 66 Ma Series/Epoch = Upper Cretaceous | Began = 100.5 Ma | Ended = 66 Ma Stage/Age = Cenomian | Began = 100.5 Ma | Ended = 93.9 Ma

formation = Mancos Shale Formation | Began = 110 Ma | Ended = 80 Ma

Using the began and end dates would resolve to what you want.

Jegelewicz commented 5 years ago

The Chart is re-issued all the time; A-->C may be A-->B--C in the next version as new "species" of stratigraphy are named. I'm not sure that's fatal, but it's something to keep in mind if we end up model-building.

And perhaps we could deal with this by having the terms include a source authority?

dperriguey commented 5 years ago

@Jegelewicz I like the assigning dates (which have +/- uncertainties that can be added as well), as long as when I search for Santonian I can find stuff under Mancos Shale Group that have not been explicitly given a different Stage/Age.

@dustymc Yes, the chrono and litho stuff should be separate, but linked in some way. I'm sorry that this is a bit like solving species taxonomy that Arctos has undertaken. I believe that the Macrostrat model has been set up in this way but it's more fuzzy than what we're talking about. I like them separate. If @Jegelewicz idea for adding actual age ranges creates that link, that would be so useful.

Jegelewicz commented 5 years ago

when I search for Santonian I can find stuff under Mancos Shale Group that have not been explicitly given a different Stage/Age.

If someone explicitly said "this is from the Mancos Shale Group during the Cenomian", then you SHOULDN'T be finding it in the Santonian, should you?

dperriguey commented 5 years ago

Right. That's what I said.

Nicole-Ridgwell-NMMNHS commented 5 years ago

Still looking through this thread and still thinking, but I have a few thoughts to start off with:

Jegelewicz commented 5 years ago

I like adding the age ranges to the geochronology, those are typically very well defined.

I agree! This seems like something easy we could do, but I also think that we need to have a geochronology range with a begin and end (see https://github.com/tdwg/dwc-qa/issues/130#issuecomment-465318108), rather than just recording as many attributes as we like (the current method).

I'm more iffy on using age ranges with lithostratigraphy. It can be hit or miss - some groups/formations/members may have really well defined age constraints, some may not. Even with using +/- uncertainties this could get messy.

I am going with you on this - I also feel like there will be lots of lithostrata for which we can't find dates...

To make this even more difficult, I'm going to throw biostratigraphic units into the mix. A lot of our localities have an associated biostratigraphic units and we will need a way to put this information in the database.

Bring it on!

SO

Does it make sense to have three different tables? Geostratigraphy Lithostratigraphy Biostratigraphy

Shall I propose that we separate the first two, which are currently combined in Arctos?

Nicole-Ridgwell-NMMNHS commented 5 years ago

Yes - They should definitely be three separate tables. I think it would be great to figure out a way to link them, but I'm not sure what the best way to do that would be. The last collection I was at (they were using Specify) structured it by creating a geologic context table - each geologic context was a unique combination of geochronology, lithostratigraphy, and biostratigraphy.

Jegelewicz commented 5 years ago

each geologic context was a unique combination of geochronology, lithostratigraphy, and biostratigraphy

Yuck! I think we should stick with three tables and three possible attributes:

Geochronology: Start and End (each layer has associated began and end dates in the code table) Lithostragraphy: Name of the lithostratum Biostratigraphy: Name of the biostratum

For the last two, do we need a range for the times when stuff is in between layers? Is there another way to handle that?

This way, if you don't have one of the attributes, it's no problem and we don't have to keep a table somewhere that is a million combinations of the three, or two, or one strata.

dustymc commented 5 years ago

What would splitting the data up among multiple tables accomplish?

Here's the structure, if that's helpful.

UAM@ARCTOS> desc geology_attributes;
 Name                                  Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 GEOLOGY_ATTRIBUTE_ID                          NOT NULL NUMBER
 LOCALITY_ID                               NOT NULL NUMBER
 GEOLOGY_ATTRIBUTE                         NOT NULL VARCHAR2(255)
 GEO_ATT_VALUE                             NOT NULL VARCHAR2(255)
 GEO_ATT_DETERMINER_ID                              NUMBER
 GEO_ATT_DETERMINED_DATE                            DATE
 GEO_ATT_DETERMINED_METHOD                          VARCHAR2(255)
 GEO_ATT_REMARK                                 VARCHAR2(4000)
Jegelewicz commented 5 years ago

We want to be able to find things by geochronology, lithostratigraphy, and/or biostratigraphy. If these are all mashed together in a field called "geology" it isn't all that useful.

These aren't hierarchical as a group, so we can't arrange them in a tree as was originally attempted.

Perusing a code table of these terms all mashed together is pain

I think we are going to recommend changing the structure, especially now that I know geology won't split localities, which means that you couldn't know if a specimen came from layer A or layer B and that eventually a locality could end up having hundreds the geology attributes, rendering them effectively meaningless.

Also, we definitely need to assign start and end dates to the geochronology terms, whereas the other two do not have well-defined time periods.

dustymc commented 5 years ago

What question do you think can't be answered with the existing data?

Perhaps the code table needs re-thought, but that does not affect the data structure.

GIGO applies to localities (with or without geology), just like everything else.

I don't understand your comment regarding geochronology. The existing data are associated with specimens and therefore stored as attributes - do you have something else?

Jegelewicz commented 5 years ago

What question do you think can't be answered with the existing data?

Show me all the specimens from XYZ specific location that are from the Jurassic.

Just because a specimen is using a locality with the Geology attribute "Jurassic" it doesn't mean it came from the Jurassic, if the locality also includes the Geology attribute "Miocene".

However, some localities may have multiple geologic layers. Even if you can associate layers A and B with a particular locality, how do you show that a particular fossil from that locality was found in layer A? from https://github.com/ArctosDB/arctos/issues/1975#issuecomment-473460979

Jegelewicz commented 5 years ago

Perhaps the code table needs re-thought, but that does not affect the data structure.

GEOLOGY_ATTRIBUTE_ID NOT NULL NUMBER LOCALITY_ID NOT NULL NUMBER GEOLOGY_ATTRIBUTE NOT NULL VARCHAR2(255) GEO_ATT_VALUE NOT NULL VARCHAR2(255) GEO_ATT_DETERMINER_ID NUMBER GEO_ATT_DETERMINED_DATE DATE GEO_ATT_DETERMINED_METHOD VARCHAR2(255) GEO_ATT_REMARK VARCHAR2(4000)

We will probably ask to replace GEO_ATT_VALUE with

BEGIN_GEO_ATT_VALUE END_GEO_ATT_VALUE

or figure out some way of recording that for the geochronolgy attributes.

I like adding the age ranges to the geochronology, those are typically very well defined.

I agree! This seems like something easy we could do, but I also think that we need to have a geochronology range with a begin and end (see tdwg/dwc-qa#130 (comment)), rather than just recording as many attributes as we like (the current method). from https://github.com/ArctosDB/arctos/issues/1935#issuecomment-473451851

Jegelewicz commented 5 years ago

GIGO applies to localities (with or without geology), just like everything else.

Who says the multiple geology attributes are garbage? A given set of coordinates on the map can have hundreds of combinations of geo/litho/bio strata. We are now working in 3D and I think those geology attributes actually create new localities AND there should only be one of each kind (geo/litho/bio) for each locality.

Jegelewicz commented 5 years ago

I don't understand your comment regarding geochronology. The existing data are associated with specimens and therefore stored as attributes - do you have something else?

We need a range (I know you hate that), but in paleo, we have the situation of "this thing came from somewhere between the Permian (Began) and the Triassic (End).

dustymc commented 5 years ago

Just because a specimen is using a locality with the Geology attribute "Jurassic" it doesn't mean it came from the Jurassic, if the locality also includes the Geology attribute "Miocene".

That is 100% a problem with the data. I would not hesitate to create new localities, regardless of what's in the original data, if dealt that situation and I somehow knew more about a specimen from a messy locality. A more likely (I hope!) situation would be two experts looking at a sample and returning those two values, in which case the "false positives" are equally relevant.

In any case, if those are the data Arctos can deal with them, and if there are better data Arctos can deal with that, and if you want to keep both around that isn't a problem either.

I believe that Arctos is modeled correctly, will handle any geology data that we are presented with, and can answer any questions those data are capable of answering. If we're going to talk about new models, starting with something that violates those assumptions would be very useful.

I don't dislike ranges in general, but I do dislike lossy data. Unless you're recording eg, the source of your range I have to guess what you thought was in the middle. If you did record that and I know what you were thinking I still can't search for "ranged" terms unless Arctos somehow knows the contents of whatever publication(s) you referenced. Of the current 14479 geology determinations, three have enough information in method for me to probably eventually figure out enough specifics to make sense of a range.

Nicole-Ridgwell-NMMNHS commented 5 years ago

Since I am new to Arctos and still learning, I'm having a little trouble following this thread, however I think I can make a stab at clarifying this question.

What question do you think can't be answered with the existing data?

I'm looking at the search form right now - with all the geology data under the single geology attribute, that means I can only search for one geology attribute. Take the following chart: image Say I want to search for fossils from the Lower Coniacian, Scaphites preventricosus zone, Smoky Hill Shale Member. If I'm understanding the search form correctly, it seems it would be difficult or impossible to run this search with the current structure of the geology attributes.

dustymc commented 5 years ago

The data are stored in a 1-locality to zero-or-more geology attributes relationship. Each geology determination is a single "row" in a table, but any number of those may be linked to any locality.

We can allow search by any number of geology attributes, and changing the interface is generally fairly simple.

If I'm understanding the data, those are three THINGS - rock, time, bio - and we've separated them in the code table, so there are no implicit linkages. For a specimen you might enter all three, and if you do so you could filter for specimens that have all three attributes applied to a linked locality. If you've only entered one of them, then a search for all three won't find that specimen. A search for wing length won't find specimens that don't have wing length data; the same principle applies here.

There are many ways of searching geology:

Does that make sense?

Nicole-Ridgwell-NMMNHS commented 5 years ago

Thanks for clarifying, allowing searches for multiple geology attributes would be great.

Jegelewicz commented 3 years ago

I think we have covered all of this - closing