OpenTreeOfLife / treemachine

Source tree graph database
Other
16 stars 6 forks source link

returning unique_name or not? #210

Closed kcranston closed 8 years ago

kcranston commented 8 years ago

The description of the taxon-blob states that unique_name should be an empty string if the same as name. The description of the blob also states that fields should be absent if no value, not returned as empty string. Here are three options for the case where the unique name = the name:

  1. do not return the unique_name field
  2. return unique_name field = ""
  3. return unique_name field = unique name

Thoughts? I like the third option (always return unique name). I think that the other two options imply that the property is absent for the taxon, i.e. that some taxa do not have unique names. @josephwb @jar398

[description edited by JAR to clarify distinction between unique_name field and unique name property]

jar398 commented 8 years ago

A taxon's 'unique name' is a string. All taxa have a 'unique name'. No two taxa have the same 'unique name'. The 'unique name' always has the name as a prefix. For about 99.2% of taxa, the 'unique name' is identical to the name.

In the OTT taxonomy file, there is a column 'uniqname' which is empty if the unique name is the name, and otherwise is the unique name. This design was inherited from pre-OTT days. Suppressing the repeated name does help make the taxonomy file a bit smaller and more readable.

In the v2 API, the 'uniqname' or sometimes 'unique_name' field is whatever was in the OTT 'uniqname' column. In the v3 API draft, the 'unique_name' field is the same as it was in v2. This was for two reasons. One was a desire to avoid gratuitous differences between v2 and v3. The other was an interest in getting the v3 API draft out without having to review every last detail inherited from v2.

The value of the 'unique_name' field was not previously documented. That little bit of description is absolutely not new, it simply describes what we've been doing for years. This is a case of something that wasn't a problem before, suddenly becoming a problem when brought into the light of day.

(KC, you used 'unique_name' to sometimes mean the unique name and sometimes the value of the 'unique_name' field, which is confusing when we're talking about how the two relate to one another. I've taken the liberty of fixing that (in a minimally invasive way) in your issue description.)

Having {name:x, unique_name:x} seems redundant, cluttered, and wasteful of bandwidth. If the unique_name field is missing, that does not mean that the taxon does not have a unique name (that is a use/mention fallacy), it just means that the unique name is not transmitted in the JSON blob under the unique_name field (just as many other properties of the taxon are not communicated in the JSON blob).

But this is not something I can get worked up about. If others want the unique_name field to be always present and to always have the unique name as its value (as opposed to sometimes-"" which I admit is screwy) that's fine with me.

By the way I prefer API issues to go in the germinator repo. An API spec does not belong to any particular implementation of it, and clients are affected by API changes just as much as servers.

snacktavish commented 8 years ago

I like option 3.

josephwb commented 8 years ago

If the unique_name field is missing, that does not mean that the taxon does not have a unique name (that is a use/mention fallacy), it just means that the unique name is not transmitted in the JSON blob under the unique_name field (just as many other properties of the taxon are not communicated in the JSON blob).

But that is a rule that you made up and no one would ever know without reading the documentation, and why force someone to read documentation when it is unnecessary? I don't buy that this is true redundancy, but even if it were, as @bredelings has so eloquently stated before "are we trying to save characters on the internet?".

Option 3 seems like the only reasonable choice.

jar398 commented 8 years ago

If I tell you today that I like cheese, and then tomorrow I don't tell you that I like cheese, do you then conclude that I no longer like cheese?

Options 1, 2, and 3 are all reasonable. The purpose of my comment was to give the reasons, which I had no reason to think were known, not to advocate. I have already said 3 is fine with me.

josephwb commented 8 years ago

What I am saying is that the situation is more like one where you never told me you like cheese. Should I could read the documentation to figure out what you default cheese preference is?

I'm not sure options be considered reasonable if they are confusing to people on the project.

kcranston commented 8 years ago

Since @jar398 does not feel strongly about any of the options, and the rest of us like option 3, let's go with that. The taxon blob will always return a unique_name field and the contents of that field will either be name (if unique_name==name) or unique_name (if unique_name!=name).