JiscaH / sequoia

R package for pedigree inference based on SNP data
25 stars 6 forks source link

Option suggestion "Overlapping=T or F" & Analysis questions #6

Closed jblamyatifremer closed 6 years ago

jblamyatifremer commented 6 years ago

Dear Jisca,

I know that you are working hard on a new version of sequoia. Maybe It is the time to suggest you an option :

1- Option suggestion : Sometimes i works with population without overlapping generations (hatchery process)... Indeed, I am always confident that parents are contained N generation and individuals of N+1 are offsprings of N individuals. One potential problem of such design is that as the generation goes the inbreeding increase and decrease our power of reconstructing the pedigree.

2- My trick to analyse without overlapping generations : Some context elements to understand the design. I am analyzing a 4 generation massale selection populations. Individuals at generation N are challenged against a virus, and the survivors are mated together (the exact pedigree is unknow) to produce N+1 individuals and so on.

Actually, I analyze the dataset by take the individuals of generation N (I set the "By" column in Lfh table to 1 and sex with 4) and for N+1 individuals (I assign set the "By" column in Lfh table to 2 and sex with 4). By iteration, I get the whole pedigree with less wholes compare to the strategy where i analyze the dataset with everybody with all the generation.

Do you have an smart/better idea ?

I hope my frenchy english is good enough to explain my problem and solution.

JiscaH commented 6 years ago

Thank you for your message, feedback and suggestions are always welcome!

Note that I've finished the new version of sequoia recently, which is now on CRAN (v 1.0.2) - thanks for the reminder to update the github message.

Having non-overlapping generations indeed makes reconstructing the pedigree quite a bit simpler, and setting the "BY" (birth year / hatching year) column to 1 for generation N, 2 for generation N+2, etc. as you do is the right thing to do. The disadvantage of splitting up the data as you suggest is that sometimes mistakes in the lab do happen, and when analysing all data together, any pairs which genetically look like parent and offspring, but are not exactly 1 generation apart according to your data, will be returned in the 'MaybeParent' bit. Moreover, if you have some genotyping errors, 'knowing' the parents of the candidate parents helps to infer the true genotype of that candidate parent, and thereby assignment.

Because of the various fast filtering steps that sequoia uses to find likely candidate-parents, computation time doesn't go up as quickly with increasing sample size as you'd expect (see https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F1755-0998.12665&attachmentId=198710059 , page 15, Fig S10, bottom-right panel). If you have a really really large dataset and struggle to deal with it in R, that is another matter of course, and then splitting the data might be a reasonable tactic.

If you haven't successfully genotyped all parents and want to run full pedigree reconstruction, not just parent assignment, then please see page 10 of the vignette ( https://cran.r-project.org/web/packages/sequoia/vignettes/sequoia.pdf ). Based on the assigned parents sequoia will automatically detect that generations are not/hardly overlapping, and use this during the full pedigree reconstruction. By default it allows for some 'wiggle room', but the vignette describes how you can make this a hard rule by editing the 'AgePriors' bit.

I hope this answers your questions, and good luck with analysing your experiment,

Jisca

On 11/04/2018 09:23, jblamyatifremer wrote:

Dear Jisca,

I know that you are working hard on a new version of sequoia. Maybe It is the time to suggest you an option :

1- Option suggestion : Sometimes i works with population without overlapping generations (hatchery process)... Indeed, I am always confident that parents are contained N generation and individuals of N+1 are offsprings of N individuals. One potential problem of such design is that as the generation goes the inbreeding increase and decrease our power of reconstructing the pedigree.

2- My trick to analyse without overlapping generations : Some context elements to understand the design. I am analyzing a 4 generation massale selection populations. Individuals at generation N are challenged against a virus, and the survivors are mated together (the exact pedigree is unknow) to produce N+1 individuals and so on.

Actually, I analyze the dataset by take the individuals of generation N (I set the "By" column in Lfh table to 1 and sex with 4) and for N+1 individuals (I assign set the "By" column in Lfh table to 2 and sex with 4). By iteration, I get the whole pedigree with less wholes compare to the strategy where i analyze the dataset with everybody with all the generation.

Do you have an smart/better idea ?

I hope my frenchy english is good enough to explain my problem and solution.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JiscaH/sequoia/issues/6, or mute the thread https://github.com/notifications/unsubscribe-auth/AQwkHRUAQN5BZ8mzkd37d4jQUPCqn7Q4ks5tnb18gaJpZM4TPk2b.