EuracBiomedicalResearch / FamAgg

This is the development version of the FamAgg Bioconductor package.
https://EuracBiomedicalResearch.github.io/FamAgg
MIT License
0 stars 2 forks source link

Issue with Complex pedigree in Poultry #31

Open JRecoq opened 10 months ago

JRecoq commented 10 months ago

Greetings, I have peculiar situation on my side with FamAgg.

The Pedigree I use is quite convoluted as it come from breeding company data.

If I use a "complete" pedigree, several birds are duplicated accross families giving me the following message when using FAData :

Erreur dans kinship2::pedigree(...) : Duplicate subject id: 125Duplicate subject id: 125Duplicate subject id: 125Duplicate subject id: 125Duplicate subject id: 125Duplicate subject id: 125

If I use a trimmed pedigree, the following error is given : Generating the kinship matrix...Erreur dans kinship2::pedigree(famid = ped[, "family"], id = ped[, "id"], : Value of 'momid' not found in the id list 4/3927518502234 5/3927518502234 10/3927518502234 11/3961511284313 4/3971527126014

I understand that the functions are doing their jobs and that poultry husbandry is extremely different from human one. But in case of human, I suppose that sometimes a pedigree can contain several times a father or a mother, so how can this situation be managed by FamAgg.

Below an example of a poultry family.

Many thanks for your time.

plotFamilyExemple

jorainer commented 10 months ago

The error actually comes from the kinship2 package, specifically the kinship2::pedigree function that is used by FamAgg to calculate the kinship matrix from the pedigree data. I haven't seen your data, but could it be that your pedigree data.frame contains multiple times the same individual? Each individual of a pedigree should only be present once once, i.e. in a single row in the data.frame. If several children have the same father is not problem. In that case you would have multiple times the same ID in the "father" column, but the ID of the individual (child) has to be unique. Simple example what is allowed and how it should look like:

id fatherid motherid
11 1 2
12 1 2
13 1 3
14 2 4
15 2 4

What is not allowed is duplicated rows/IDs:

id fatherid motherid
11 1 2
11 1 2
... ... ...

i.e., column id needs to contain unique elements, fatherid and motherid can contain duplicates.

JRecoq commented 10 months ago

Many thanks for your answer and sorry for the long time before mine.

You are completly right.

I am pretty sure that the issue is from my dataset and that I am trying to do something for what FarmAgg was not intented to.

When I tried to use it with a "classic" pedigree (one row equal one unique ID), I had error message signaling me that some mothers did not have child id in some family.

Thus I duplicated all these mothers into the corresponding familiy, but then the error given to me was that there were duplicated child ID.

So, maybe the issue is that I want to use FarmAgg in too big Pedigree with too much interconnexion between families breaking the notion of "family" as intended when the library was created ?

jorainer commented 9 months ago

In fact, the "family" notion is not really used in FamAgg or the kinship2 package. to build a kinship matrix (or plot the data) 3 columns are key, the ID of the individual, the ID of the mother and the ID of a father. The latter two can also be NA (if I remember correctly) if they are not available or known. I'm in fact using FamAgg for a large dataset with ~ 10,000 individuals, so, size is not a problem - but the definition of a family is, especially if you have more than one generation.

Note: I just learned that kinship2 has a new maintainer, maybe also open an issue over there as the developer now seems very responsive: https://github.com/LouisLeNezet/kinship2. And the problem you have is more with kinship2 than with FamAgg.

JRecoq commented 9 months ago

Thanks, I will look into that now.

Thanks again for yout time and answers.