Open ljwh2 opened 8 months ago
Genome-wide sequencing: | GWAS_id | Techniques | harmonised | Raw_rows | Harmonised_rows | hm_14 | hm_15 | hm_16 | Drop_ration | hm_15(%) | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | GCST90010173 | Genome-wide sequencing | yes | 24181159 | 18290576 | 0 | 0 | 0 | 24.36% | 0.00% | |
2 | GCST90093113 | Genome-wide sequencing | yes | 7173861 | 7164907 | 0 | 47998 | 0 | 0.12% | 0.67% | |
3 | GCST90001390 | Genome-wide sequencing | yes | 7843596 | 7654311 | 0 | 45969 | 2 | 2.41% | 0.59% | |
4 | GCST90014052 | Genome-wide sequencing | yes | 5056041 | 5056029 | 0 | 11168 | 2 | 0.00% | 0.22% | |
5 | GCST90161593 | Genome-wide sequencing | yes | 10004360 | 9450643 | 0 | 0 | 0 | 5.53% | 0.00% |
V.S. Genome-wide genotyping array: | PMID | GCST_id | genotyping array | harmonised | Raw_rows | Harmonised_rows | hm_14 | hm_15 | hm_16 | Drop_ration | hm_15(%) |
---|---|---|---|---|---|---|---|---|---|---|---|
33589840 | GCST90012878 | Genome-wide genotyping array | yes | 25643629 | 25367157 | 0 | 292056 | 67 | 1.08% | 1.14% | |
28887542 | GCST005069 | Genome-wide genotyping array | yes | 25290284 | 25186082 | 19 | 179244 | 2 | 0.41% | 0.71% | |
33782385 | GCST012278 | Genome-wide genotyping array | yes | 7216416 | 7180648 | 1 | 35317 | 0 | 0.50% | 0.49% | |
33143745 | GCST90093334 | Genome-wide genotyping array | yes | 8034880 | 7982170 | 0 | 131524 | 18 | 0.66% | 1.64% | |
30053915 | GCST006353 | Genome-wide genotyping array | yes | 5694112 | 5692296 | 7 | 24756 | 0 | 0.03% | 0.43% |
Next to do:
V_95 (2018) | V_105 (2021) | V_111 (2023) | Total variants | 2% variants | |
---|---|---|---|---|---|
GCST90010173 | 75.64% | 75.76% | 78.39% | 24181159 | 483623.18 |
GCST90179391 | 79.04% | 74.38% | 77.75% | 30566328 | 611326.56 |
For variants that can be harmonised by V_95 but not V_111, it happens to two conditions:
@ljwh2 Can we close this ticket? After our investigation:
We also tried to investigate if the updated reference VCF file improved the harmonisation rate among the WGS data, we tested on 8 studies, and 3 studies increased the rate and the other 5 decreased. Therefore, new reference VCF does not necessarily improve the harmonisation rate.
Our collaborator mentioned that they cannot use the variants that cannot be harmonised.
We would like to verify whether there is a significant difference in efficiency of the hm pipeline for seq GWAS.
If possible it could be useful to analyse separately for GWAS-SSF and pre-GWAS-SSF formats