Open CoryFunk opened 8 years ago
Yet another track I'd like to add. I have the original bed files with the footprints for each of the 17 ENCODE brain samples. These files were intersected with the FIMO results to give us our current list of footprints. I'd like to have a track that has the raw footprints. This would allow us to show which samples in which the footprints were found (and if they're common) as well as show all the footprints for which there is no associated TF motif.
The original bed files are found on whovian: /local/Cory/trn/bed_files Here is a quick list of the corresponding tissue types:
ENCSR000EIJ cerebellum 6658087390 tissue adult male 35 year
ENCSR000EIK frontal cortex 6427390460 tissue adult male 35 year
ENCSR000EIY frontal cortex 6582538185 tissue adult female 80 year
ENCSR000ENA astrocyte of the hippocampus 1428950104 primary cell unknown unknown
ENCSR000ENC astrocyte of the cerebellum 1500105356 primary cell unknown unknown unknown
ENCSR000ENE brain microvascular endothelial cell 1483151899 primary cell unknown unknown unknown
ENCSR000ENF brain pericyte 192560176 primary cell unknown unknown unknown
ENCSR000ENL choroid plexus epithelial cell 12008785004 primary cell unknown unknown unknown
ENCSR224IYD medulla oblongata 2534977716 tissue adult male 78 & 84 year
ENCSR318PRQ middle frontal gyrus 1484101094 tissue adult male 78 year
ENCSR475VQD brain 435364969 tissue fetal male 72, 76 days
ENCSR503HIB cerebellar cortex 171040463 tissue adult male 78 & 84 year
ENCSR595CSH brain 8306812576 tissue fetal unknown, male 56,58 day
ENCSR706IDL midbrain 1197197679 tissue adult male 78 & 84 year
ENCSR771DAX globus pallidus 2579852906 tissue adult male 78 & 84 year
For the upcoming retreat, I'm hoping we can work towards the following.
As input, I'd like to give a region of interest (i.e. chr5:87936379-88297792) and an (optional) list of SNPs. As a first pass, the list of SNPs will be the eQTLs I got from Mariette, with everything in LD from 1000 Genomes. For MEF2C Mariette gave me ~150 SNPs, resulting in a total of ~2,200 SNPs (LD > 0.8).
From this region of interest, I'd like to see the following tracks:
1) footprints (from the file) 2) IGAP SNPs as a Manhattan Plot 3) ADNI SNPs, possibly broken up into phenotype-specific tracks (i.e. APOE4 protected, control, AD), with a frequency score (as you showed yesterday) 4) SNPs of Interest. These could be as a bed file and specified in the function call. Down the road it may also have an associated p-value like we talked about with Seth. 5) Intersect between SNPs from 2, 3 and 4 with the footprints. Could be multiple tracks, but one track probably would suffice. This could employ the padding option you already have. 6) A regulatory TF track. This would also be entered in the call and would be a list of TFs that are known regulators of the nearest gene. This would need to be able to convert the TF names to the motif names in the footprint file.
I see the call looking something like this: