ISA-tools / stato

This is the development repository for the STATistics Ontology (STATO). For more information and demonstration on the ontology content, please visit its website:
http://stato-ontology.org/
31 stars 8 forks source link

Best linear unbiased predictor definition in Stato #63

Closed bauchetg closed 6 years ago

bauchetg commented 6 years ago

Hi Stato team,

I wondered if there is a plan for including Best linear unbiased predictions in stato. This type of predictors is widely used in genomic selection in different ways (BLUP, gBLUP, etc..) to estimate breeding values. Thanks for your input! Cheers

Guillaume

proccaserra commented 6 years ago

Hi Guillaume This is definitely in the scope of STATO so we'll work at adding support for those methods. If you have specific query cases you'd like to discuss/support , drop us a line. Many thanks for the request Best Philippe

bauchetg commented 6 years ago

Hi Philippe,

Thank you for taking my request into account. In a mixed linear model general definition like: y = XB + Zu + e "B" et "u" are the vector of fixed and random effect respectively, "X" and "Z" are the associated design matrices and "e" is a random residual vector. Random effect are assumed to be normally distributed and a variance covariance matrix V. Fixed effect being estimated by BLUE and random effects predicted through BLUP usually through restricted maximum likelihood (REML).

In plant breeding, for a standard trial phenotypic data analysis, the genotype effect is used as fixed (BLUE) or random (BLUP) in combination with other experimental factors (i.e. design). Genotypic value decomposition can be done trough additive or additive + non-additive (dominance + epistasis). -> These are the classic BLUP and BLUE ontological definition I would like to see in Stato.

The other BLUP definition of interest for Stato are in relation with genomic selection models. Genomic selection aims to predict the breeding value (GEBV) of a genotype using genomic best linear unbiased prediction (gBLUP). In gBLUP, a genomic relationship matrix is used, estimated from genomic data. It defines covariance between individuals based on observed similarity in genomic data, (instead of pedigree).

There are multiple ways of calculating these predictors through different relationship matrices/variance components (additive, dominance, additive+dominance, etc...)

Ontology wise, I don't know how much of these methological aspect can be included (so many exist and new ones are constantly being released) but I could suggest a few of interest for our research group.

Having such ontological terms woudl be of great help for many breeding systems and databases Thanks for your help,

Best regards,

Guillaume

proccaserra commented 6 years ago

Hi Guillaume, very useful input, thx. You may have noticed that we are currently working on this very issue but many placeholders (STATO_0000427 : Restricted maximum-likelihood estimator, STATO_0000428 : maximum-likelihood estimator, STATO_0000107: statistical model with the children classes which include 'linear mixed model' and STATO_0000119: model parameter estimation . We need to add classes to cover BLUP, BLUE, gBLUP. I will push those. The key thing is to understand how you'd like or intend to query your database (query case). This would help think about what needs to be present in STATO to best help you. all the best Philippe

bauchetg commented 5 years ago

Hello @proccaserra! Is https://github.com/ISA-tools/stato/blob/dev/dev/ontology/stato.owl the go to file to see the changes requested above and implemented here: https://github.com/ISA-tools/stato/commit/695ed55bb4a001925df5ca31976e78ebaed4d758 Somehow I cannot find them again while searching in protege. Can you help pointing these? Sorry about that

bauchetg commented 5 years ago

My bad, was looking in 1.3 and it is in 1.4, solved