I was using scoary for ~3,600 isolates to test trait association on ~22,000 genes; however, even when I specify -p 1.0, scoary only reports ~3,100 genes in the results rather than the complete set of ~22,000 genes. The analysis ran to completion without errors as well.
Here's the log file content:
08/13/2020 10:34:24 AM ==== Scoary started ====
08/13/2020 10:34:24 AM Command: /home/jimmy.liu/.conda/envs/scoary-1.6.16/bin/scoary --threads 32 -g /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/allelic_presence_roary.csv -t /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/Cluster_157_subset_metadata.csv -o /scratch/jimmy.liu/reference_structure_chewbbaca_res_2020/cluster_157_gwas/ -p 1.0 -m 22416
08/13/2020 10:34:24 AM Reading gene presence absence file
08/13/2020 10:34:49 AM Creating Hamming distance matrix based on gene presence/absence
08/13/2020 10:36:30 AM Building UPGMA tree from distance matrix
08/13/2020 10:38:34 AM Reading traits file
08/13/2020 10:38:34 AM Finished loading files into memory.
08/13/2020 10:38:34 AM ==== Performing statistics ====
08/13/2020 10:38:34 AM -- Filtration options --
08/13/2020 10:38:34 AM Individual (Naive): 1.0
08/13/2020 10:38:34 AM Collapse genes: False
08/13/2020 10:38:34 AM Tallying genes and performing statistical analyses
08/13/2020 10:38:34 AM Gene-wise counting and Fisher's exact tests for trait: grp
08/13/2020 10:39:50 AM Adding p-values adjusted for testing multiple hypotheses
08/13/2020 10:39:50 AM Storing results: grp
08/13/2020 10:39:50 AM Calculating max number of contrasting pairs for each nominally significant gene
08/13/2020 10:41:04 AM Storing results to file
08/13/2020 10:41:04 AM
08/13/2020 10:41:04 AM ==== Finished ====
08/13/2020 10:41:04 AM Checked a total of 22416 genes for associations to 1 trait(s). Total time used: 399 seconds.
08/13/2020 10:41:04 AM No warnings were recorded.
Hi,
I was using scoary for ~3,600 isolates to test trait association on ~22,000 genes; however, even when I specify -p 1.0, scoary only reports ~3,100 genes in the results rather than the complete set of ~22,000 genes. The analysis ran to completion without errors as well.
Here's the log file content:
You can find my data here: Trait file: https://drive.google.com/file/d/18nj3zFWS5OWONIn1xZhM_Uht6siOY6-n/view?usp=sharing Gene presence/absence file: https://drive.google.com/file/d/1pWaDezegBbhc06yTV2OoiMcr3Es6SeRj/view?usp=sharing
Cheers, Jimmy