AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
147 stars 35 forks source link

Not all genes are reported with "-p 1.0" #67

Closed IEkAdN closed 5 years ago

IEkAdN commented 6 years ago

Hello,

I ran scoary with "-p 1.0" option to obtain p-values for all genes, but a part of genes in gene_presence_absence.csv were not reported. Could you please tell me why scoary does not output information for all genes even when I used "-p 1.0"?

AdmiralenOla commented 5 years ago

Hello, and thanks for reporting an issue! Scoary's supposed to return all genes when you do that, so something is wrong. I've never seen that behavior before. Would you consider sending me your input files and the command you used so I can have a look? My e-mail is ola.brynildsrud@fhi.no

AdmiralenOla commented 5 years ago

Closing due to resolution. Core genes (here: genes that are present in every single isolate) will not appear in the results file, since it is impossible to statistically associate them with any trait.

Further, if some of your isolates have missing data, other genes could "become" core genes and thus not appear in the results file. For example, imagine that gene1 is present in isolates A-Y, and absent in Z. However, Z has missing phenotypic data and can therefore not contribute to the experiment. As a result, Z is dropped from the analysis of this trait. This effectively means that gene1 is present in every single isolate included in the analysis, and as such it will be dropped from the results.