PriceLab / snpFoot

an R package with miscellaneous functions and public data for the exploration and analysis of gene regulation
0 stars 0 forks source link

The Kitchen Sink, or what I'd like to see with a single query #1

Open CoryFunk opened 8 years ago

CoryFunk commented 8 years ago

For the upcoming retreat, I'm hoping we can work towards the following.

As input, I'd like to give a region of interest (i.e. chr5:87936379-88297792) and an (optional) list of SNPs. As a first pass, the list of SNPs will be the eQTLs I got from Mariette, with everything in LD from 1000 Genomes. For MEF2C Mariette gave me ~150 SNPs, resulting in a total of ~2,200 SNPs (LD > 0.8).

From this region of interest, I'd like to see the following tracks:

1) footprints (from the file) 2) IGAP SNPs as a Manhattan Plot 3) ADNI SNPs, possibly broken up into phenotype-specific tracks (i.e. APOE4 protected, control, AD), with a frequency score (as you showed yesterday) 4) SNPs of Interest. These could be as a bed file and specified in the function call. Down the road it may also have an associated p-value like we talked about with Seth. 5) Intersect between SNPs from 2, 3 and 4 with the footprints. Could be multiple tracks, but one track probably would suffice. This could employ the padding option you already have. 6) A regulatory TF track. This would also be entered in the call and would be a list of TFs that are known regulators of the nearest gene. This would need to be able to convert the TF names to the motif names in the footprint file.

I see the call looking something like this:

 snpfoot_display(chr = 5, start = 87936379, end = 88297792, 
                   snp_list = Mariette_snps.bed, padding = 10, tf_list = TReNA_tfs)
CoryFunk commented 8 years ago

Yet another track I'd like to add. I have the original bed files with the footprints for each of the 17 ENCODE brain samples. These files were intersected with the FIMO results to give us our current list of footprints. I'd like to have a track that has the raw footprints. This would allow us to show which samples in which the footprints were found (and if they're common) as well as show all the footprints for which there is no associated TF motif.

The original bed files are found on whovian: /local/Cory/trn/bed_files Here is a quick list of the corresponding tissue types:

ENCSR000EIJ cerebellum  6658087390  tissue  adult   male    35 year
ENCSR000EIK frontal cortex  6427390460  tissue  adult   male    35 year
ENCSR000EIY frontal cortex  6582538185  tissue  adult   female  80 year
ENCSR000ENA astrocyte of the hippocampus    1428950104  primary cell    unknown unknown 
ENCSR000ENC astrocyte of the cerebellum 1500105356  primary cell    unknown unknown unknown
ENCSR000ENE brain microvascular endothelial cell    1483151899  primary cell    unknown unknown unknown
ENCSR000ENF brain pericyte  192560176   primary cell    unknown unknown unknown
ENCSR000ENL choroid plexus epithelial cell  12008785004 primary cell    unknown unknown unknown
ENCSR224IYD medulla oblongata   2534977716  tissue  adult   male    78 & 84 year
ENCSR318PRQ middle frontal gyrus    1484101094  tissue  adult   male    78 year
ENCSR475VQD brain   435364969   tissue  fetal   male    72, 76 days
ENCSR503HIB cerebellar cortex   171040463   tissue  adult   male    78 & 84 year
ENCSR595CSH brain   8306812576  tissue  fetal   unknown, male   56,58 day
ENCSR706IDL midbrain    1197197679  tissue  adult   male    78 & 84 year
ENCSR771DAX globus pallidus 2579852906  tissue  adult   male    78 & 84 year