bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

Different sample size per population #69

Closed naborlozada closed 2 years ago

naborlozada commented 2 years ago

Hi all,

I'm analyzing 7 populations (3 from America and 4 from Africa) and trying to find some outliers by using pcadapt. Some populations are located in the same country, while others are very distant. All of them have a different number of samples (individuals):

pop1=10    # (Africa, country A)
pop2=3     # (Africa, country A)
pop3=19    # (Africa, country B)
pop4=11    # (Africa, country C)
pop5=9     # (America, country D)
pop6=17    # (America, country D)
pop4=6     # (America, country D)

I try to find a signal of local adaptation (if any) at the level of country and continent, however, I was wondering if populations with few samples (pop2 and pop4) might produce a bias in the analysis. On the other hand, All populations do not have equal number of samples, as in your pcadapt example, so, when making an analysis at the country (for example with populations from country D) or continent (all populations from Africa or America) level, should I remove those with few samples?

How to deal with populations with different number of samples? Is there any test or way to evaluate or make a normalization? Any suggestion? I will really appreciate.

Best regards. Nabor

privefl commented 2 years ago

It is generally better to have populations of even sample size for PCA, but unless you have some really large differences, I don't think this is a problem for PCA or pcadapt.

naborlozada commented 2 years ago

Hi, So, I understand that my populations (from above) are okay to compare to each other as described.
Thank you for your reply!

privefl commented 2 years ago

I didn't see the numbers.. they are quite low.. I have never used pcadapt with so few numbers, so I don't know. But I guess at least pop2 and pop7 are not okay.

naborlozada commented 2 years ago

Yes, that was concern. I was thinking to take out those with 5 or less samples (or more "strict", with at least 10 samples). It is an arbitrary filter, though. I'll make some tests. Thanks for your help. best.