Open jiho opened 4 years ago
Need to think more about this, I guess. Would you really want a category "copepod w eggsack full lipid sack full gut" and how do you enforce the proper order of this?
It is certainly possible to do it this way. However, this will flood the tree with categories that are actually attributes because you will need these attribute categories in many levels of the tree, and it will be hard to achieve consistency - even if you provide predefined attributes. Finally, it gets even worse with multiple attributes (like @rkiko said).
In the long term (maybe EcoTaxa 3.0;)), it might be better to transition to a annotation scheme with a primary, phylogenetic, identification and secondary attributes. In the UI, these attributes could still be displayed as (virtual) children of a taxonomic node or maybe there is still a better way.
Attributes separated from identifications would also solve the problem of detritus classification. There would be no need to decide whether it should be "light/compact" or "compact/light", there would be just fiber, feces, etc with possible attributes.
Lastly, attributes solve the problem of validated-ness: If an object is in "copepoda" but not in "copepoda/ovigerous", does this mean that it is without eggs or is it just that no one bothered to put it into the more specific category? With attributes, you could make this explicit, an object would be related to an attribute in three ways: not annotated, positive (with eggs), negative (explicitely without eggs).
Attributes are clearly the way to go. From a semantic point of view, "Oithonidae > Oithona" relation is not the same as "Oithona > female+eggs". It's a hack of the concept. As the proverb says "When all you have is a hammer, everything looks like a nail" :)
I understand and agree that attributes are a more elegant design. That said...
We would need an UI to assign attributes in the classif page. Having a separate process to assign attributes and taxa will be a pain for users. And we have a somewhat optimised way of assigning taxa labels (drag and drop, text autocompletion, keyboard shortcuts, etc.); so, on this page, attributes need to appear as taxa, and therefore as children of the taxon they are attached to.
For automatic classification, we definitely want objects with attributes to be predicted separately from the objects without attributes (i.e. predict part<Crustacea
separately from Crustacea
) and the attributes themselves cannot be a separate model from the taxonomic model, because part
, ovigerous
, etc. do not look the same across taxa so the model should treat the morphological attributes as a sub-part (i.e. a child) of the taxon.
At export, the attributes cannot be coded a separate column each attribute, with yes/no/unassigned in each, because there will be many such attributes and most are relevant for only a few taxa (e.g. all the 20 larval stages of copepods that are valid for them only). So it will need to be a single attributes
columns, with all the attributes lumped together as text, with separators, and some convention to denote the yes/no case for those for which it makes sense (with eggs/without eggs
, male/female/hermaphrodite
, etc.).
For the summarised export, and for any exploitation of the data, one would need to compute the concentrations per taxon+attribute+sample (and not just taxon+sample). If we treat attributes for what they are (i.e. independent information about a taxon, which can be combined however one wants), in a project with Copepoda
(taxon), with eggs
(attribute), and with lipid sac
(attribute), the logical thing to do would be:
Copepoda
= total number of all copepods (with any combination of attributes)
Copepoda+with eggs
= all copepods tagged with at least the with eggs
attribute
Copepoda+with lipid sac
= all copepods tagged with at least the with lipid sac
attribute
Copepoda+with eggs+with lipid sac
= copepods with both attributes
But then, if one just sums per sample, the result is wrong because some objects are counted twice or thrice, which is very bad(™). For numbers to be summable however we cut the data (which we absolutely need), we need to make mutually exclusive categories like this
Copepoda
= number of copepods with no attributes
Copepoda+with eggs
= all copepods with only the with eggs
attribute
Copepoda+with lipid sac
= all copepods with only the with lipid sac
attribute
Copepoda+with eggs+with lipid sac
= copepods with both attributes
And, this comes back to treating attributes as separate "taxa", children of the Copepoda taxon. Then we need a post-processing function that can aggregate counts/concentrations at higher levels (to get all Copepoda for example) but such an aggregation function is needed even for the purely taxonomic aspect anyway (e.g. I want the biomass of all Crustacea per sample). And we kind of have such functions (in R and MATLAB); we just need to package them better and make them public.
So, for all intents and purposes, attributes would need to be nested within taxa and work like a separate taxon, from the point of view of the user.
But we could argue that we could still code them differently. Why would we want to do so?
Attributes do not need to be organised in a hierarchy; so, for the case above, we do not have to enforce that with eggs
is nested below with lipid sac
or the opposite, and this is good because the order of the nesting could be arbitrary.
That said, for attributes of biological taxa, some are hierarchical (e.g. female>with eggs
) and many are mutually exclusive (e.g. juvenile
& with eggs
), so the case above is actually extremely rare.
Then comes the case of detritus, for which we have plenty of attributes: dark/light
, compact/fluffy
, fiber/globulous
, etc. which we often want to combine (dark large fiber
). But, for those, I actually think we need a hierarchy and we should enforce it at the level of EcoTaxa as a whole. This hierarchy can be built objectively: (i) take attribute groups already labelled in EcoTaxa or lump the detritus together and make HDBSCAN or k-means clusters; (2) perform hierarchical clustering of those groups based on the morphological attributes; (3) organise labels depending on the hierarchy hence constructed. Based on our morphological-space work done recently, I can guess that the attributes will be organised according to size > darkness > complexity of shape. This would help people focus their sorting effort of these things. In addition, the I3S team is working on hierarchical automatic classification, the first results are promising, and for it to work, everything needs to fit in a hierarchy.
Attributes can be made to encode explicitly positive
, negative
, not looked at
or cannot tell
, for each. However:
male/female/hermaphrodite
, copepodites C1/C2/C3/etc.
so we can't have a universal code for such statuses.with eggs
and without eggs
)and mostly, the problem of explicit positive/negative is much wider within EcoTaxa, and there is a convention to solve it: within a project, if the categories Copepoda
, Copepoda>Harpaticoida
, Copepoda>Calanoida
exist, the category Copepoda
contains all copepods that are not Harpaticoida
and Calanoida
(or that we cannot tell); so, in a project with Copepoda
and Copepoda>with eggs
, the same convention means that Copepoda
contains the images with no eggs or on which we cannot tell if there are eggs or not.
This convention is not satisfactory, because (i) there is a difference between and explicit negative and "I cannot tell", (ii) the meaning of the Copepoda
group is not homogeneous across projects, it depends on what is below (and we bang our heads on this when we want to create larger scale datasets). So we need a better solution, but this solution cannot be just for the morphological attributes, we need it to work the same way across the whole taxonomy. I have not found such a solution.
So, overall, coding these things as attributes would complexify the code quite a lot (in every place where something is done per taxon, it needs to be done per taxon and combination of attributes); from the point of view of the user, they will need to be made to appear as separate taxa (so there is actually more work there); in some places we may need to have them encoded in some sort of hierarchy (and we would need a separate system for that, while we already have a taxonomy).
On the other hand, the cost of implementing them as sub-taxa is more entries in the taxonomy; then everything else works. But this will constitute a few thousand, maybe ten of thousands of entries, which is negligible compare to the size of the tree of life.
I am usually the one arguing for making things right rather than easy. But here I would vote for practicality. In addition, it is not completely wrong either: the hierarchical taxonomy if a center piece of EcoTaxa (it is right in the name), so making things fit within that taxonomy is meaningful.
PS: I could even go on about the fact that the phylogenetic divisions into family, genus, species etc. are somewhat arbitrary and that or morphological divisions sometimes make as much sense as the phylogenetic ones, that "species" is an overrated concept etc. but this reply is long enough as it is.
I think you are mixing two things: how to aggregate the data and how to do the assignment of the attributes. I think most of what you write clearly is in favour of attributes, although it will become more and more difficult to switch from the current way of how things are done to an attribute assignment approach, as datasets grow ...
Data aggregation should be no problem, one just has to code it. But if you are interested to know the ratio of copepod females with eggs w. no-eggs, you will with attributes just ask for data that has these attributes assigned (female, not at female; w. eggs, wo eggs, can not tell; assuming that within the dataset you are querying these have been assigned consistently). No attribute means no data. Currently, you have to be lucky that the user has written "female" and "eggs" correctly...
Also, different users are interested in different attributes. This leads to problems when merging data... Even worse, how to merge datasets in which people have noted gut fullness and then egg carrying, vs other way round??? copepod w full gut and egg sack vs copepod w egg sack and full gut??? Total nightmare ahead now that we are starting to ask these questions ... Even with the UVP data we do not have consistent sorting across projects. It will just get worse without attributes ...
We would need an UI to assign attributes in the classif page. Having a separate process to assign attributes and taxa will be a pain for users.
I don't think so. I think it will be easier to sort, if you do not have to worry about the attributes in the first round of sorting. I think that a user will assign attributes if important for his/her task,question. So, it is a secondary step after identifying the general class of the organism.
E.g. egg-carying copepods. You would then want to enter a 'attribute-assignment stage' where you are shown all copepods from all subclasses. You could then first search for all females, mark them and assign the corresponding label by hitting a button. The others get the 'no-female label', as you have checked them. Second step is to find the egg-carying females. Hit another button to assign w. eggs, all others get wo eggs., done. This could be done on the same ecotaxa page, but I think designing an own stage might be simpler...
Another comment: I think attributes are a quality of an object, above you write that "part" could be an attribute. I use 'part' to signify that this is not the complete organism, e.g. a bitten off tail or a lost antenna. This is not an attribute. It is not the complete organism and therefore it is a class.
Also too long for a Sunday ;)
Just read another thing that is mA wrong in your concept:
But then, if one just sums per sample, the result is wrong because some objects are counted twice or thrice, which is very bad(™). For numbers to be summable however we cut the data (which we absolutely need), we need to make mutually exclusive categories like this
Copepoda = number of copepods with no attributes Copepoda+with eggs = all copepods with only the with eggs attribute Copepoda+with lipid sac = all copepods with only the with lipid sac attribute Copepoda+with eggs+with lipid sac = copepods with both attributes
You would define which attributes you want to use for your study before aggregation, not try to aggregate everything. And you should get an output that shows also the objects where this attribute was not assigned. In your case and if you are interested in eggs you have three types of data:
no egg attribute assigned
egg attribute positive
egg attribute negative
That someone assigned the lipid sac attribute does not mean that he/she checked the egg attribute ... Clearly, to do quantitative work on these attributes, they have to be assigned consistently. That can be checked with the attribute assignment scheme/stage proposed above. Currently, you can only assume that someone has identified all the egg-bearing copepods if some classes with eggs are there. But you can not be sure and you have no record in the data (maybe in a protocol), as you do not have the class "copepod + without eggs". Nobody generates or sorts into this class. Mostly, the classes where we provide attributes are now generated out of curiosity. And would you want classes copepod + without eggs + wo lipid sack + wo gut visible???? You are saying that the "base class" is this, a copepod for which the mentioned attributes were checked but are negative. This is another very important reason to have attributes ...
Cheers, I'll go swimming ;)
Ah, I am also not sure if I would have larval stages as attributes or classes. I would say where they are clearly distinguishable (nauplii vs. copepodites) classes, and attributes where not, but you could also make the case that sorting into different naupliar stages should in principal be possible, so they should be classes. I guess we need to define this clearly. Maybe, if something has a hierarchical, phylogenetic meaning down to the species level it could be a class, if not, it needs to be an attribute. Examples
Calanus hyperboreus N1 nauplius could be a class: crustacea/copepoda/.../C.hyperboreus/nauplius/n1
against
crustacea/copepoda/nauplius
If we now here have an n1, it should maybe rather be an attribute ...? Difficult...
So, what about males vs. females? In many cases we can not decide, so I would argue for an attribute.
I guess we need clear rules when a class generation is allowed and when an attribute needs to be used... But if something has a full gut or not is not a taxonomic characteristic, it is clearly an attribute. Although I just ate, so I feel more human ;)
Cheers, Rainer.
I am usually the one arguing for making things right rather than easy. But here I would vote for practicality. In addition, it is not completely wrong either: the hierarchical taxonomy if a center piece of EcoTaxa (it is right in the name), so making things fit within that taxonomy is meaningful.
This is also my opinion but I would also add that as project managers, we have to make choices for the usage of the always limited ressources we have. My suggestion for better homogeneisation and limited development cost would be to facilitate the standardisation of "taxon like" attributes by suggesting pre-defined names in the EcoTaxo interface.
"I am usually the one arguing for making things right rather than easy. But here I would vote for practicality. In addition, it is not completely wrong either: the hierarchical taxonomy if a center piece of EcoTaxa (it is right in the name), so making things fit within that taxonomy is meaningful."
A full gut is not a taxonomic feature, same for carrying eggs. So we can agree that categories with such names should not be allowed, if the "taxa" in the name is the decisive thing. What you suggest here is like trying to fit circles into squares, because they are both blue. Other way to go is to think what is needed and build a tool for it.
And we do not have limited resources, we have a lot of time to do this. We just have to say now that this is the way we want to go and then get the money to get it done.
Naming rules in ecotaxa are currently not helpful to set a framework for attributes. Creating rules to do something, that could be done better with attributes is not foreward-looking. We should make things right, although it might take one or two years ...
And there are already names that we can not bring together anymore (from the ecotaxa taxonomy):
Paraeuchaeta>female+eggs+ectoparasites
living>other>egg
living>other>egg>like
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>Acartiidae>Acartia>Acartia sinjiensis>egg
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>egg
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Mollusca>Gastropoda>Heterobranchia>Euthyneura>Euopisthobranchia>Thecosomata>Cavoliniidae>Cavolinia>Cavolinia inflexa>egg
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Mollusca>egg
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg 1 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg 2 3 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg 4 6 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg 7 8 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg 9 11 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Engraulidae temp>egg unkn temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg 1 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg 2 3 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg 4 6 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg 7 8 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg 9 11 temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Clupeiformes temp>Clupeidae temp>Sardina temp>egg unkn temp
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Poecilostomatoida>Oncaeidae>Oncaea>female/eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>Clausocalanidae>Pseudocalanus>female/eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>egg>empty
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>egg>small
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>egg>medium
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Cyclopoida>Oithonidae>Oithona>Oithona similis>female+eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Cyclopoida>Oithonidae>Oithona>Oithona atlantica>female+eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Cyclopoida>Oithonidae>Oithona>female+eggs
living>other>egg>egg sac
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>Euchaetidae>Paraeuchaeta>female+eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Amphipoda>Senticaudata>Calliopiidae>Apherusa>Apherusa glacialis>female+eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>Euchaetidae>Paraeuchaeta>female+eggs+ectoparasites
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>Euchaetidae>Paraeuchaeta>Paraeuchaeta glacialis>female+eggs+ectoparasites
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Calanoida>with-eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Cyclopoida>Oithonidae>Oithona>with-eggs
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Cyclopoida>Oithonidae>Oithona>with-eggs-lateral
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Arthropoda>Crustacea>Maxillopoda>Copepoda>Poecilostomatoida>with-eggs
living>other>egg>circular egg
living>Eukaryota>Opisthokonta>Holozoa>Metazoa>Chordata>Craniata>Vertebrata>Gnathostomata>Actinopterygii>Teleostei>egg
I think 'we' should start with a listing of all these "things we can see on the pictures which complement the species information", with their possible combinations. I mean, for those of us who do not have a bio background :)
# extract taxo
taxo <- tbl(db, "taxonomy") %>% select(id, parent_id, name, taxotype, nbrobj) %>% collect()
# extract used, morphological taxa
taxo %>%
filter(nbrobj>0, taxotype == "M") %>%
select(-taxotype) %>%
# add the parent and lineage to give context
mutate(
unique_name=taxo_name(id, taxo, unique=TRUE),
lineage=lineage(id, taxo)
) %>%
relocate(nbrobj, .after="parent_id") %>%
arrange(name) %>%
write_csv("morpho_taxa.csv.gz")
And a cleaned up version in which I try to categorise what the morphological taxa designate https://docs.google.com/spreadsheets/d/1knrjgyQyHeFnGt9B5gGcRe8cr8XrKkrQ71KoXR8D1jM/edit#gid=1782421282
Summary: over ~550 morpho categories:
NB: the total is more than 550 because some names designate two things at once.
Only the last 3 could be defined as attributes; and for life stages, many do not make sense for taxa outside of a certain phylum (e.g. veliger larvae are only in Molluscs).
So overall, beyond doing a bit of cleanup, which we should do, I really don't see the point in setting up an overly complicated system for such a small number of things, in particular when an alternative solution (help with the selection of standard names from a list) offers most of the benefits.
I have to say that I completely disagree with your take on this. For me this is actually a frustrating chaos that only will get worse. You will just create a huge overhead if you create all the standard names for this:
calanus / female+eggs+ectoparasites
which for a full proper sorting requires:
calanus / female + no eggs + no ectoparasites calanus / female + egg + no ectoparasites calanus / male + no eggs + no ectoparasites (although male != eggs anyhow)
and so on ...
Assuming that
calanus / female is equal to calanus / female + no eggs + no ecotparasites
is just bad practice.
This one: """ ~150 describe the shape of the object; but is is often difficult to know if this is a mere precision on an existing category, to help train classif models (e.g. Appendicularia > s-shaped vs Appendicularia > straight) or if the shape define the category itself (e.g. Rhizaria > dark with some spikes = has a well known shape, should have a phylogenetic name but I cannot find it for sure) """
is another reason why attributes are needed!!! Attributes will help to clarify if it is a distinct category or an attribute!!!
It might be good to have standard names now. But in the long run, this is no solution, because you can not aggregate the data in a meaningful way across projects. Some will use standard names, others not. To get to standard names now, you will need to force all users of EcoTaxa to review their categories and agree on standard names. That will be a crazy endeavour...
It would be good to investigate how attribute tag's could be introduced (complexity of the task, how much time/money it will take), instead of dumping the idea.
Reviving an old thread just to make a few notes:
One aspect not mentioned above is how DarwinCore treats these. Some they are indeed attributes of a taxon (e.g. sex, developmental stage, etc.) which would push in favour of attributes (although very few exist in DwC). But they are counted as separate occurrences (e.g. C hyperboreus without attribute is a separate occurrence from C hyperboreus juvenile) which, in EcoTaxa parlance, means they are categories. So it pushes towards the implementation of some attributes as attributes in the database but for their presentation as separate categories in the interface and the export.
If we ever do it then I'd say we need namespace as prefixes (shape:elongated, colour:dark, sex:male, etc.) and then attributes and combinations of attributes would show as subcategotries, in alphabetical order, e.g.
Copepodus schroderus
Copepodus schroderus [repro:non-ovigerous] [sex:female] [stage:adult]
Copepodus schroderus [repro:ovigerous] [sex:female] [stage:adult] [view:lateral]
Copepodus schroderus [repro:ovigerous] [sex:female] [stage:adult] [view:frontal]
Copepodus schroderus [sex:male] [stage:adult]
Copepodus schroderus [sex:female] [stage:adult]
Copepodus schroderus [sex:female] [stage:adult]
Copepodus schroderus [stage:juvenile] [view:lateral]
Copepodus schroderus [view:frontal]
Each shows only the objects that have the combination of all elements. However, one can already see from that example above that it may cause issues in terms of UI; and it still does not guarantee that all attributes are filled for all objects (which I am not sure we can guarantee). Food for thought...
I 100% agree with the namespacing. (This is also what I have in my proof-of-concept.)
I also agree that for the classifier, an "object description" (taxon with tags) should be a unique category (for the time being). (Predicting all these attributes is a much bigger undertaking than merely supporting such annotations.)
But from the UI perspective, it should be possible to increase the taxonomic resolution (move into a sub-taxon) and add or remove tags without moving an object to a totally different place in the tree (as it is currently the case: "Copepoda/female" is not a parent of "Calanoida/female"). Yes, will be difficult to design. But we might find a satisfactory solution if we play with prototypes.
(Also, +1 for the great Copepod species name!)
Until tags become native to EcoTaxa (which may be never), we need to come back to the question how to represent tag-like data using categories (i.e. "Homogenise taxonomic sublabels such as male, female, part, etc."), although this will be "a frustrating chaos that only will get worse" (~RK).
I think the taxonomic hierarchy should be reserved to taxon (phylo), category (not phylo but still a distinct category) and morphology (a distinct shape within a category), and should not extend to other aspects. E.g. Copepoda > Copepodus schroderus > repro:ovigerous+sex:female+stage:adult+view:lateral
, not Copepoda > Copepodus schroderus > repro:ovigerous > sex:female > stage:adult > view:lateral
(>
establishing a parent-child-relationship and +
being just a character)
(This is in order to limit the depth of the hierarchy for UI reasons.)
Then, we still have the problem of order: repro:ovigerous+sex:female
is conceptually the same as sex:female+repro:ovigerous
, but a different string. I would just always sort them alphabetically.
(This approach would also leave the option to later implement tags in the database but keep the UI basically the same.)
Will there be a chapter about category naming in the taxonomy guide that @picheral was talking about? It would be useful to lay down some rules there.
One further comment on the view
tag: Instead of specifying the depicted side of the organism (lateral/dorsal/ventral/frontal
), we should maybe specify the orientation of the image plane relative to the organism (median/frontal/transverse
):
This would be more appropriate when dorsal/ventral
or anterior/posterior
is ambiguous because the organism is transparent and one can, in fact, only tell the plane but not the view.
(frontal
does not even fit into the line of lateral/dorsal/ventral
, it should be anterior/posterior
instead.)
As an intermediate step, before tags may eventually implemented (JO: "I have no idea when, except not soon."), I propose to go forward with the status quo (using categories like Copepoda>female+with-eggs+lateral
), except that we start storing structured data in the category description. This would allow external tools to work with tag data and translate that back and forth between a tag-aware format and EcoTaxa's tag-agnostic categories. This is my proposal in detail:
Use morpho categories. This is what we have and this won't change soon. Aggregation and automatic prediction continue to work as expected.
No nesting of morpho categories. One option is having child categories of a phylogenetic one and as many of them as there are combinations of tags. This has the disadvantage of an exploding number of direct children below a phylogenetic one. The alternative is to use a hierarchy of morpho categories, but the one needs to choose the order of the hierarchy (e.g. "sex" first or "view" first). My suggestion is to not enforce any particular solution here and accept both flat and hierarchical morpho categories. (It seems that hierarchical morpho categories are currently not used very often.)
Short names of morpho categories. I think that names should be short and should not contain a prefix (e.g. "female" instead of "sex:female"). The name should be recognizable but nothing more.
Do not parse the category name. There is no standard how category names are constructed. Sometimes a combined name (e.g. A+B
) means "objects with either property A or B lumped together" or "objects with both A and B at the same time". Therefore, there is currently no reliable way to translate category names to tags and vice versa.
Structured tag data in the category description. I propose to use the concise text-based data format that I already came up with some time ago. It can both encode presence and absence of a certain property. In this format:
sex:male
=> this IS a male.!sex:male
=> this IS NOT a male.life-stage:copepodit:C-IV
. Objects positively tagged with a child tag automatically activate all parents, e.g. life-stage:copepodit:C-IV
implies life-stage:copepodit
. Objects negatively tagged with a parent tag automatically preclude all descendants, e.g. !life-stage:copepodit
implies !life-stage:copepodit:C-I,C-II,...
. Only positive tag names are allowed. If an objecte IS NOT something, the negation of the positive name is used, e.g. !with-eggs
instead of without-eggs
.
Use DarwinCore names where possible. DarwinCore has specific terms for some of those; we should use them when possible, e.g. sex, lifeStage, reproductiveCondition.
This system provides a structured way to enrich the current EcoTaxa categories with tag-based data. If EcoTaxa adopts tags in the future, there is a clear transition path from this form to a more native solution. In the meantime, external tools can work with tag data and still have the EcoTaxa database as the single source of truth while providing advanced features like querying by individual properties.
We have plenty of such cases:
male
,female
,juvenile
,with eggs
,part
, etc. We currently create a child of the parent taxon with a name that tries to follow conventionsto label Copepods with eggs
to label broken bits of Crustaceans etc.
This works well because
part
andovigerous
are true taxa (with an ID, a parent, etc.) so they can be assigned like any other taxon (no change in UI and backend functions), can be aggregated together with the parent for scientific analysis when it makes sense, and the display convention for taxonomic name (add the parent when the name is duplicated) means that they actually appear asovigerous<Copepoda
, andpart<Crustacea
which makes sense.So I like this solution.
What I don't like is that the consistency of the naming is left to the users creating taxa. Therefore, instead of implementing a tag system separate from the taxonomy (which is a pain to assign them, combine them, aggregate them etc.), I suggest making a function that allows to easily create a children category with a pre-defined number of names (the ones above and a few others).
Now the discussion is where to implement this. My guess is that, in the taxo creation modal there should be an obvious button/list next to "Name" that says "Predefined" and shows the list of common sublabels and then a text area called "Custom" which allows to type a name. When a predefined one is selected, the type is made "Morpho".