Closed gouthamatla closed 5 years ago
Hi Goutham,
You can use the intervals file we provide (regions where DeepSEA training/validation/testing data contains at least 1 TF) or come up with your own (non-enhancer open chromatin regions sounds fine if it fits with your use case). The intervals are just the regions that you want to sample from - you'd only use it if you think there should be restrictions on the regions from which Selene can generate samples.
Yes, you should use hg19 if your data is hg19.
What do you mean by top saliency features? (I don't think we provide that kind of functionality though.) Do you mean whether sequence-level deep learning models can identify TF binding in enhancer regions? We haven't looked specifically at enhancer regions but I don't think there would be a huge difference region-to-region in the genome for model prediction accuracy.
Thanks, where can I find hg19 intervals file ? I guess it should be here:
wget https://zenodo.org/record/1443558/files/selene_quickstart.tar.gz
https://github.com/FunctionLab/selene/tree/master/manuscript/case1/data#additional-note
You can just liftOver from hg38 back to hg19 if that's faster. Otherwise I don't know if we provide it (if we use IntervalsSampler in case 2 it's probably downloadable there) - might need to regenerate from the DeepSEA data + the script linked
Hi All,
I am interested to use selene on my own data. I am thinking to try couple of things.
One is to train the selene on TF ChIP-Seq data of my interest and perform ISM. I think this is exactly CASE1 of Selene paper. I have data from hg19, so I should use hg19 fasta and hg19 interval files ? Where can I find hg19 intervals file ?
On the other hand, I have enhancers from tissue of my interest..., which might not have one TF binding sites, but they can have multiple TFs binding sites. In this case, have you tested selenes performance on diverse sequences like enhancers ? Is there any way to get the top saliency features from selene ?
In any case, I understood that intervals file is used to create training, validation and test sets. Am I correct ? Can I use non-enhancer open chromatin regions from tissue of my interest as interval file to run CLI ?
Thanks, Goutham A