Filter data for NAc and check overlap with external GWAS

This task involves https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/master/twas/filter_data/filter_snps.R for which https://github.com/LieberInstitute/10xPilot_snRNAseq-human/commits/master/twas/filter_data/filter_snps.R andd related commit messages are useful.

The main output of this script will be a plink bed file (so .fam, .bim, .bed) with the LIBD DNA genotype data for the samples using in this RNA-seq project (NAc). For that we need to find which LIBD brains from NAc were used, which I have done already at https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L13-L54.

Previously, in the brainseq_phase2/twas code we were then filtering the variants (SNPs) down to the ones that were present in an external dataset (called "LDREF" in the code). But we have decided to no longer do so. Thus it's likely that we don't need any of https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L60-L337 anymore. Although as noted in the commit message for https://github.com/LieberInstitute/10xPilot_snRNAseq-human/commit/3f6839095d271a456e49d58a57ad7e6983afd553#diff-0e353a8e945a6907ebcb678b68d26a2b another step we had done was to filter the SNPs to those present in a GWAS of interest using https://github.com/LieberInstitute/brainseq_phase2/blob/master/twas/psycm/convert_to_hg38.R.

Before we proceed with https://github.com/LieberInstitute/10xPilot_snRNAseq-human/blob/1e1d2bb6627b06c22decdd642178bd25f9aeb42e/twas/filter_data/filter_snps.R#L13-L54 it's best to run the checks in #1. After #1, clean up the code in filter_snps.R so we keep only the code relevant for NAc and make it easier to understand in the future.

I removed the sections having to do with LDREF and executed the code to write files to one of my directories, /users/aseyedia/NAc_TWAS/, so as to not overwrite the files that were already in the original directory, /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/.

However, it seems as though the files generated before and the files that I generated just now are identical (with the obvious exception of the log file):

10:40 NAc_TWAS $ sha1sum /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicoti
ne.*
ff50a1bd49d348f49ac3f4c0032aefab96be2047  /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_fi
ltered_NAc_Nicotine.bed
bf6b231b58cc7f1317ec46b6af7216deb2936721  /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_fi
ltered_NAc_Nicotine.bim
4dd3c87ec7464e1a40443b74734e55a895d3ea7d  /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_fi
ltered_NAc_Nicotine.fam
a3b7291172cf689fdff0b4f33d632a017b41bc2c  /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_fi
ltered_NAc_Nicotine.hh
6048e5b381892641fcf4545b3dba1f36a36ff346  /users/aseyedia/NAc_TWAS/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_fi
ltered_NAc_Nicotine.log
10:40 NAc_TWAS $ sha1sum /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_Macrogen_QuadsPlus
_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine*
ff50a1bd49d348f49ac3f4c0032aefab96be2047  /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_M
acrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine.bed
bf6b231b58cc7f1317ec46b6af7216deb2936721  /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_M
acrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine.bim
4dd3c87ec7464e1a40443b74734e55a895d3ea7d  /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_M
acrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine.fam
a3b7291172cf689fdff0b4f33d632a017b41bc2c  /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_M
acrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine.hh
ebf6278f99e2a980a46aa6a03791fda7b9ead2a1  /dcl01/lieber/ajaffe/Matt/MNT_thesis/snRNAseq/10x_pilot_FINAL/twas/filter_data/LIBD_merged_h650_1M_Omni5M_Onmi2pt5_M
acrogen_QuadsPlus_dropBrains_maf01_hwe6_geno10_hg38_filtered_NAc_Nicotine.log

So, just to clarify, all that was done here was a simple cleaning up of the irrelevant code.

LieberInstitute / 10xPilot_snRNAseq-human

Filter data for NAc and check overlap with external GWAS #2