handling of compound-measure traits

fdschneider commented 7 years ago

Certain trait measurements are composed from multiple values, e.g. morphometric landmarks (x,y,z) or biogeochemical compound profiles (spectra of organic compounds found in a sample).

Those might be bound together in the data table via the measurementID field. However, the traitName or traitID then must also be filled accordingly with the respective sub-value of the trait measurement.

For the R script this poses a challenge since for now the trait measurements are grouped by traitName to enable factor checkign and numerical unit transformations. If entering data as compound measurements, the R script needs to pull all members of the compound trait into one table.

A x,y,z value might be handled as a single entry, as well. It might be interesting to combine it with rotating functions for normalization, but this is probably for a later version.

This needs further specification for the trait template. e.g. another field measurementID_user might be offered. The automatically generated measurementID then would issue a unique identifier for the whole measurement.

nadjasimons commented 7 years ago

Is this related to the issue of having multiple measurements on one individual (as discussed in issue #6 ) or is this a different type of problem?

fdschneider commented 7 years ago

I think it is somehow related, but different. This came up in the discussion with Micha Heethoff in Darmstadt who measures detailled morphometric traits in 3D space. Thus, they rather measure landmarks which then can be linked to different body lengths, areas or volumes. For this kind of data, leg numbering and left-right side is imperative in the trait definitions, i.e. definition of landmarks. But therefore it is incompatible with a traitlist that we had in mind. I just want to create the possibility to enter such multivariate trait data by allowing for a linking via the measurementID. In our scheme, the multiple legs would be single measurements (each one measurementID) but would be linked via the specimenID. But this all depends on the definitions in the traitlist that is linked to the dataset, not on the trait template itself.

nadjasimons commented 7 years ago

Ok, I see. Would it help to define such landmark traits as strings with a certain type of structure? Kind of similar to the YYYY-MM-DD convention. Then those three values would be one trait value. Would this then mean that a list of different landmarks (say landmark_left_tibia_tip, landmark_left_tibia_end, etc.) need to be added to the list of traits? If yes, would Micha Heethoff be willing to share his list of landmarks?

fdschneider commented 7 years ago

I consider this as somehow solved.

By providing a measurementID, the user can link multiple entries of the table into a single measurement. The trait definitions (traitName and traitID) then should specify about the interpretation of the multivariate data. E.g. the three trait names landmark12_x, landmark12_y, landmark12_z would all be defined as composites of the same landmark (a landmark in turn is defined by morphological descriptions of its position, e.g. a particular spine at the upper end of the tibia, in a particular projection). Alternatively, a trait value could use a special character string scheme just as Nadja proposed. Another use case suggested were biochemical traits, which are composed of quantities of chemical weight along the spectrum, i.e. hundrets of single values.

I would suggest that we stay out of this. Morphometrics and Biochemistry researchers will develop their own terms and ontologies and figure out what is best for their data. I think I can encourage Micha to develop something along these lines and help introducing our trait standard in this field.

fdschneider / bexis_traits

handling of compound-measure traits #11