AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
147 stars 35 forks source link

genetic differences among populations defined by population analysis #101

Closed kopelol closed 2 years ago

kopelol commented 2 years ago

Hi, I'd like to explore the population-specific genes. I defined 6 populations using population analysis method.

Scoary take into account the population structure, but in my case the dataset had already been divided into 6 populations using population analysis. So should I skipped this step?(If so, could you please teach me how to?) or Should I specify the newick file I created from core-gene alignment obtained by roary?

Best regards,

kopelol commented 2 years ago

In addition, I tried to use "--no_pairwise", but the results was same before using this option.

AdmiralenOla commented 2 years ago

Hi @kopelol .

In this case you should just use your 6 already specified populations from your previous analysis. Note that you will have to create 6 individual phenotype variables and then have a binary membership. Example:

,Pop1,Pop2,Pop3,Pop4,Pop5,Pop6 Strain1,1,0,0,0,0,0 Strain2,0,0,0,1,0,0 etc.

Using "--no_pairwise" should make your analysis run quicker since you are noe performing any pairwise comparisons, that is, you are not correcting for population structure at all. In these analyses it doesn't actually matter if you use a pre-defined newick tree or let this be handled internally by Scoary.