famuvie / breedR

Statistical methods for forest genetic resources analysts
http://famuvie.github.io/breedR/
GNU General Public License v3.0
31 stars 24 forks source link

GBLUP with BreedR example from workshop in Poland #105

Closed GregorDall closed 4 years ago

GregorDall commented 4 years ago

Hi, I am attempting to perform GBLUP with BreedR, therefore I have, as suggested, looked at the example from the workshop in Poland. Are there any new developments towards GBLUP in the package, or any newer examples?

I have one question on the poland example: In the prediction accuracy calculation, the predicted breeding values are correlated with the column BV in the data. I cannot imagine where the BV comes from, since true BVs are usually unknown, and predictions are correlated with phenotypes of individuals not used for model training. Please clarify.

Antoher question: On the wiki page it is advised to use the "GS3" package for genomic evaluation. Is there any roadmap for integration into the BreedR package?

Best Regards Gregor

famuvie commented 4 years ago

Dear @GregorDall, thank you for your message.

Regarding your question about the correlated predictions and BV in the Poland example, I can't find exactly what you are looking at. It might be the case that we illustrated the method with simulated data. Can you precise?

The development on breedR has been halted until further notice. Thus, no, there have not been new developments on GBLUP. Indeed, GS3 was in the roadmap, but it will not be implemented in the foreseeable future. Hopefully, someone takes over the development or new funding is allocated to this project. I will be glad to help the developer to get started.

I the meanwhile, I keep maintaining the package up to date, fixing critical bugs if any and giving support to users via the mailing list.

GregorDall commented 4 years ago

Dear @famuvie,

it is sad and unfortunate to hear that there is no further development into BreedR, since it seems to be a super useful package and I was just starting to use it.

My question is on the GBLUP example metagene_gs.R In the associated data file pheno_ped.txt there is pedigree information (self, dad, mum) and phenotypic information (gen, BV_X, phe_X).

In the demo script, two models using the remlf90 function are demonstrated, one using pedigree information only and one using only marker data, but both are using phe_X as a response variable (fixed = phe_X ~ 1). From these models, the estimated breeding values (EBVs) are extracted. To assess model performance, EBVs are correlated with the column BV in the data file.

What I do not understand is where this BV values come from. Usually in GS, one has the problem that the true BV is unknown and model performance has to be assessed base on the phenotypic records. If I do this, and change the code in a way that EBVs are correlated with phe_X model performance is of course worse, and the marker based model surprisingly performs worse than the pedigree based model. Attached you find my modified R script. metagene_gs.R.gz

Thanks and Best Regards, Gregor

leosanrod commented 4 years ago

Dear all,

Well, yes, it's a pity there is no further developments. On the other hand, we have a tool that works finely for many cases.

To answer to Gregor's question: indeed, in real cases the best information we have to make a validation is the phenotype. In this example, we had simulated data, and we knew therefore the true breeding value used in the simulation.

Hope this clarifies things,

Best

Leopoldo

Le 11/06/2020 à 09:33, GregorDall a écrit :

Dear @famuvie https://github.com/famuvie, metagene_gs.R.gz https://github.com/famuvie/breedR/files/4763276/metagene_gs.R.gz

it is sad and unfortunate to hear that there is no further development into BreedR, since it seems to be a super useful package and I was just starting to use it.

My question is on the GBLUP example metagene_gs.R In the associated data file pheno_ped.txt there is pedigree information (self, dad, mum) and phenotypic information (gen, BV_X, phe_X).

In the demo script, two models using the remlf90 function are demonstrated, one using pedigree information only and one using only marker data, but both are using phe_X as a response variable (fixed = phe_X ~ 1). From these models, the estimated breeding values (EBVs) are extracted. To assess model performance, EBVs are correlated with the column BV in the data file.

What I do not understand is where this BV values come from. Usually in GS, one has the problem that the true BV is unknown and model performance has to be assessed base on the phenotypic records. If I do this, and change the code in a way that EBVs are correlated with phe_X model performance is of course worse, and the marker based model surprisingly performs worse than the pedigree based model. Attached you find my modified R script.

Thanks and Best Regards, Gregor

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/famuvie/breedR/issues/105#issuecomment-642468069, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNVC3DL5VZMYNJ62XPRP73RWCCFVANCNFSM4NT4GBRQ.

--

NEW INRAe-mail!!! > leopoldo.sanchez-rodriguez@inrae.fr

Leopoldo Sanchez Rodriguez, Directeur de Recherche (HdR), Délégué Scientifique UMR Centre Val de Loire INRAe Biologie intégrée pour la valorisation de la diversité des arbres et de la forêt (UMR BioForA) leopoldo.sanchez-rodriguez@inrae.fr mailto :leopoldo.sanchez-rodriguez@inrae.fr Tél: +33 (0) 2 38 41 78 14 Tél_IP Fax: +33 (0) 2 38 41 78 79 *

2163 Avenue de la Pomme de Pin CS 40001 ARDON 45075 ORLEANS Cedex 2 France https://www6.val-de-loire.inra.fr/biofora https://www6.val-de-loire.inra.fr/biofora ***