Closed lcolladotor closed 4 years ago
Using convert_to_hg38.R
, I've managed to generate some initial observations:
# this command returns false
identical(length(unique(ids)), length(ids))
[1] FALSE
> table(duplicated(ids))
FALSE TRUE
13773232 18235
# approx 55k snps are duplicates
> length(unique(ids_snpmap))
[1] 10,933,669
> length(ids_snpmap)
[1] 10,988,411
> table(gwas_test$variant %in% snpMap$snp)
FALSE TRUE
13352549 438918
table(ids %in% ids_snpmap)
# > table(ids %in% ids_snpmap)
#
# FALSE TRUE
# 5627759 8163708
As you can see with table(gwas_test$variant %in% snpMap$snp)
, the overlap between snpMap
and the GWAS
set you linked me was incredibly small (TRUE/FALSE = 0.03287147644
) but that seems due to a difference in formatting (the snpMap
has RSIDs). table(ids %in% ids_snpmap)
yielded what is probably a more reliable result because both columns are formatted ad hoc (TRUE/FALSE ~ 1.5
).
I have saved the code under twas/gwas_check.R
.
Before proceeding with the plink bed file created in https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L51-L53, it might be useful to check manually how many SNPs overlap those in a GWAS like https://nealelab.github.io/UKBB_ldsc/h2_summary_20116_0.html. Use https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L62 for the map between hg19 and hg38 coordinates. See in https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/psycm/convert_to_hg38.R how some of these checks were done for the brainseq phase 2 SNPs and the CLOZUK SCZD GWAS.
Potentially it's best to compute the TWAS weights with more SNPs even if they are absent on the GWAS side. From https://github.com/LieberInstitute/fusion_twas/blob/jhpce/FUSION.assoc_test.R#L175-L178 my understanding is that if the GWAS side is missing SNPs present in the weights, then they are labeled as NAs. So we might be ok to go with the output from https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L51-L53, but prior to the compute-intensive step of the TWAS weights computation it's best to have a general idea of how overlap there is.
Save your comparison code under the twas/filter_data directory or create one like twas/gwas_check or something like that.