fdschneider / bexis_traits

developing a trait data framework for use in the Biodiversity Exploratories
0 stars 0 forks source link

trait list: categorical traits #8

Closed fdschneider closed 6 years ago

fdschneider commented 7 years ago

How do we define categorical data? As one trait with a long list of possible levels or as dummy variables where users can assign more than one level to each trait (think of the definition for different types of omnivores)?

fdschneider commented 7 years ago

If we build on top of the existing trait Thesaurus, we should follow their constraints and definitions of factor levels as well, but we may be more explicit if they are not constraining it. For some traits, and especially those that are particular for a certain taxonomic group, a predefined vocabulary may be better. E.g. sociality, were just a few options exist, or for feeding_specialisation, were a lack of predefined factor levels would allow too many degrees of information (thus, defining ordinal levels to something like one, few, many resources, i.e. monophagous, oligophagous, polyphagous).

caterinap commented 7 years ago

I would go for dummy variables. For instance a common trait in mammals is activity diel (nocturnal, diurnal, crepuscular..). Species can be active at day/night or dusk/night. This is typically difficult to handle as categorical. Similar example is when you have food categories (e.g. leaves, seeds, invertebrates, vertebrates) and a single species can eat several categories. This might be true just for some of the categorical though.

nadjasimons commented 7 years ago

As I understood it, the T-SITA thesaurus uses boolean values for all traits. Those are structured within higher-level traits, e.g. Behaviour --> Nutrition_behaviour --> Diet --> Detritivore --> Coprophagous (0/1). Even though this means that we will end up with a huge list of traits, I think it will keep the trait database more flexible because I would assume that adding a trait should be easier than adding a trait level. On the other hand, adding a new trait would mean that all species which were characterised before, would have to be given a value (either 0 or 1) for this new trait, while adding another factor level would not require that.

fdschneider commented 6 years ago

closing, because this is of minor importance for our methods paper. The online documentation of the standard will get a page on how to build trait thesauri (see also #18). This is where we may discuss the problem around factorial traits.