QData / DeepChrome

Bioinformatics16: DeepChrome: Deep-learning for predicting gene expression from histone modifications
http://deepchrome.net
Apache License 2.0
62 stars 14 forks source link

Where #1

Closed sperfu closed 5 years ago

sperfu commented 7 years ago

Hello, I saw your work of DeepChrome and recent version Attention Model of DeepChrome, I have one question, where did you retrieve the TSS data, it seemed not mentioned in the artical? Thx!!!!

rs3zz commented 7 years ago

Hi, the file containing the information of reference genes (hg19) can be downloaded from http://genome.ucsc.edu/cgi-bin/hgTables?command=start . TSS is just the starting coordinate/position of the gene.

sperfu commented 7 years ago

well, Is the TSS file you used also download from UCSC SwitchGear TSS track (hg19)?? Because it seems that you select the up/down stream 5000bp to further research, so i think the coordinate is crucial in your study.

rs3zz commented 7 years ago

Yes, once we have the downloaded TSS coordinate of the gene, we just subtract and add 5000 to that coordinate to get a region of +/-10,000 bp around the TSS. Once done, a good sanity check would be to view a few sites on the genome browser.

sperfu commented 7 years ago

I found the TSS site track in the UCSC is the SwitchGear TSS track (hg19) , is this the file you used in your work?

rs3zz commented 7 years ago

No, like I mentioned, the TSS positions were obtained from the reference gene file "knownGene" downloaded from the UCSC table link above. However, both the files should give you the same information and you can verify it also.

sperfu commented 7 years ago

Okay, I got it. Thanks a lot!!!

farzam1371 commented 6 years ago

Is available in this site the Algorithm of deepchrome?

rs3zz commented 6 years ago

We have the detailed description of DeepChrome model our paper - https://academic.oup.com/bioinformatics/article/32/17/i639/2450757 Please let us know if any specific step is unclear.

jingNPU commented 6 years ago

Hello, I am interested in your work of DeepChrome and I have two questions. First, how did you selected the regions(+/- 5000bp in your paper) that flanking the TSS, why not 100bp or 10Kbp? Second, results on validation set during tuning across different combinations of kernel size k and pool size m didn't seem to change, how to validate k captured the local neighborhood representations of bins and m combinded the important representations across whole regions that mentioned in your paper? Many thanks!