AdmiralenOla / Scoary

Pan-genome wide association studies
GNU General Public License v3.0
147 stars 35 forks source link

Missing genes in results #84

Closed jimmyliu1326 closed 4 years ago

jimmyliu1326 commented 4 years ago

Hi,

I was using scoary for ~3,600 isolates to test trait association on ~22,000 genes; however, even when I specify -p 1.0, scoary only reports ~3,100 genes in the results rather than the complete set of ~22,000 genes. The analysis ran to completion without errors as well.

Here's the log file content:

08/13/2020 10:34:24 AM    ==== Scoary started ====
08/13/2020 10:34:24 AM    Command: /home/jimmy.liu/.conda/envs/scoary-1.6.16/bin/scoary --threads 32 -g /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/allelic_presence_roary.csv -t /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/Cluster_157_subset_metadata.csv -o /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/ -p 1.0 -m 22416
08/13/2020 10:34:24 AM    Reading gene presence absence file
08/13/2020 10:34:49 AM    Creating Hamming distance matrix based on gene presence/absence
08/13/2020 10:36:30 AM    Building UPGMA tree from distance matrix
08/13/2020 10:38:34 AM    Reading traits file
08/13/2020 10:38:34 AM    Finished loading files into memory.

08/13/2020 10:38:34 AM    ==== Performing statistics ====
08/13/2020 10:38:34 AM    -- Filtration options --
08/13/2020 10:38:34 AM    Individual (Naive):    1.0
08/13/2020 10:38:34 AM    Collapse genes:    False

08/13/2020 10:38:34 AM    Tallying genes and performing statistical analyses
08/13/2020 10:38:34 AM    Gene-wise counting and Fisher's exact tests for trait: grp
08/13/2020 10:39:50 AM    Adding p-values adjusted for testing multiple hypotheses
08/13/2020 10:39:50 AM    Storing results: grp
08/13/2020 10:39:50 AM    Calculating max number of contrasting pairs for each nominally significant gene
08/13/2020 10:41:04 AM    Storing results to file
08/13/2020 10:41:04 AM    

08/13/2020 10:41:04 AM    ==== Finished ====
08/13/2020 10:41:04 AM    Checked a total of 22416 genes for associations to 1 trait(s). Total time used: 399 seconds.
08/13/2020 10:41:04 AM    No warnings were recorded.

You can find my data here: Trait file: https://drive.google.com/file/d/18nj3zFWS5OWONIn1xZhM_Uht6siOY6-n/view?usp=sharing Gene presence/absence file: https://drive.google.com/file/d/1pWaDezegBbhc06yTV2OoiMcr3Es6SeRj/view?usp=sharing

Cheers, Jimmy

jimmyliu1326 commented 4 years ago

Ah the missing genes had empty presence/absence profiles. I was able to resolve this after properly filling those in.