bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

Incorporating biological replicates into analysis #20

Closed MarkCBitter closed 5 years ago

MarkCBitter commented 6 years ago

Hello,

I am using pcadapt to analyze pooled sequencing data from a selection experiment. I am currently running analyses with one biological replicate for each treatment, as I can not seem to find anywhere in the documentation that indicates where one can inform pcadapt that some of the columns in the dataframe are actually biological replicates.

Is there any way to incorporate this information? Thank you very much.

Best, Mark

privefl commented 6 years ago

poke @mblumuga

mblumuga commented 5 years ago

Hi @MarkCBitter, that is a good point, we do not have argument to indicate that there are biological replicates. However, when looking at the scree plot of the eigenvalues, you should be able to find that the correct value of the number of PCs K is smaller than the number of treatments * number of replicates. K should be equal to the number of treatments - 1 whatever the number of replicates per treatment you have. Is that what you have found?

MarkCBitter commented 5 years ago

Hi @mblumuga ,

Thank you very much for the quick reply on this. If I understand you correctly, this is not what I have found. When looking at the scree plot, K is simply equal to the total number of treatments * number of replicates -1. For example, in a case with two treatments, each with 3 replicates, K is 5 (though from your response K should be 1 in this case).

Please let me know if attaching any PC or scree plots would be helpful.

privefl commented 5 years ago

Indeed, these plots would be really helpful.

MarkCBitter commented 5 years ago

Below is one example with three "treatments" (to my understanding K should be 2 in this case). The replicates are color coded and the "Embryo" sample has no replicate. It is clear that the replicates (D6High_Small and D6Low_Small) are not very tightly coupled, which I now suspect may be driving the incorrect K value.

PCA.pdf Scree.pdf

MarkCBitter commented 5 years ago

In the above example, PC 1 explains 28% of the variation and PC2 explains 25%

mblumuga commented 5 years ago

K=2 is the good option for you. PC1 corresponds to differentiation between blue vs the rest. PC2 corresponds to differentiation between green and red.

I would also recommend to use environmental association analysis (e.g. the software LFMM) where you use an environnemental variable with 3 levels that correspond to the 3 colors of your graph. You can merge the 2 scans using meta-analysis (Fisher method) or Venn diagram.

MarkCBitter commented 5 years ago

I will try this. Thank you @mblumuga!