UW-Macrostrat / macrostrat-api

The API for SCIENCE
3 stars 1 forks source link

Add non-zero concept_id for all units #204

Closed aazaff closed 5 years ago

aazaff commented 6 years ago

Many units have concept_id = 0 in situations where they do not have any known synonyms.

However, it would make sense for mononym units to also have a unique concept_id. The current system erroneously implies that a huge number of uniques (concept_id = 0) are synonyms. A human reader won't be fooled, but this causes problems when trying to identify units by concept_id algorithmically.

jczaplew commented 6 years ago

Zero is generally a poor null value, and I see no reason why every strat name shouldn't have an associated concept_id

aazaff commented 6 years ago

Compare to the possible diagnosis for issue #200

aazaff commented 6 years ago

Speaking of zero as a poor null value... we use it as a NULL in a LOT of situations where we should not. Consider the lookup_strat_names table.

cambro commented 6 years ago

NULL is best left to mean "we do not know" the value. Lookup_strat_names has lots of zeros. In most of these we KNOW that there is no Group for that Formation. NULL would be a highly inappropriate value to assign to a formation that we KNOW has no group name.

What makes things is worse is that these are all supposed to be KEYS used in JOINS. NULL values in a JOIN are asking for trouble.

aazaff commented 6 years ago

Fair enough. In contrast to that though, every unit has a "concept" - so those should all have a unique concept_id.

cambro commented 6 years ago

By unit you mean strat_name? It is not the case that every name has a concept. When it is zero it really is NO and not "we don't know"

aazaff commented 6 years ago

I'm not sure what you mean that not every name has a concept. Somebody had a concept of a unit and gave it a name. It may be that no other name is attached to that concept - it is a mononym concept - but the concept exists.

cambro commented 6 years ago

I have a concept of Macrostrat in Australia, but it doesn't exist in the database.... LITERALLY as in there is no data record sense, some names do not have a concept.

jczaplew commented 6 years ago

I think Andrew's point is that for strat names that currently have a concept_id of zero it isn't exactly correct to say that we don't know their concept. We recognize them as a concept, but have chosen not to assign them a concept_id because of the arbitrary rule that more than two strat names must be conceptually similar in order to create a "concept".

While I agree that it can be useful to distinguish between an empty value being known or unknown, I feel that in this case any benefit derived from it is outweighed by the confusion and inconsistency in the data. No value is no value, and I don't see any benefit from distinguishing between the different varieties thereof.

Also, zero is an inappropriate way to make this distinction, as it is a valid value that shares a datatype with other valid values in that field.

cambro commented 6 years ago

@jczaplew this is a complete mischaracterization of concepts.

A strat_name can and does exist WITHOUT a concept in Macrostrat. We don't "recognize" them as concepts in this case at all. They are simply strat_names, usually attached to a ref, a unit, or both, but they are just names, and that's it.

We clearly all need to talk about this, as this GitHub exchange is not getting it done, and there are multiple levels of problems being expressed here.

cambro commented 6 years ago

And just to explain my down vote.. @jczaplew There are literally hundreds, probably thousands, of concepts (as in real concepts with real concept_ids) that have a single, as in 1, strat_name assigned to them. So:

arbitrary rule that more than two strat names must be conceptually similar in order to create a "concept".

Is, like, real big wrong.

Edit: there are 32,691 single-strat_name concepts! So, yeah, 83% of the concepts we have are single-strat_name concepts

cambro commented 6 years ago

@aazaff IS, however, correct that it would be great to have legit concepts for all strat_names. They should all have them.

Some names lack concepts because a concept is only acquired when the metadata for a name (or group of names that share that concept) is in-hand. We have such data for most lexicon records but we don't have this for some data that are in Macrostrat (e.g., New Zealand lexicon hasn't been ingested so names used in that small data set don't have concepts, some older names from older columns are concept-orphans, we haven't probably matched some names to concepts, etc...).

aazaff commented 6 years ago

"We clearly all need to talk about this, as this GitHub exchange is not getting it done, and there are multiple levels of problems being expressed here."

I support this.