SchulzLab / STARE

TF analysis from epigenetic and Hi-C data
MIT License
16 stars 2 forks source link

Definition of validation sets #1

Closed clarawolfensohn closed 2 years ago

clarawolfensohn commented 2 years ago

Using STARE on single-cell sequencing data is very intriguing, and I'm very excited to see you can do this. I want to try STARE with my single-cell ATAC-seq data. But as part of my sanity check, I would like to play STARE with my K562 single-cell ATAC-seq data first. Do you mind sharing your validation sets? I tried to build the validation set according to the manuscript, but the numbers don't add up. Thanks a lot!

DennisHeck commented 2 years ago

I am glad to hear that you want to give it a try! Could you specify which files would be needed as validation sets? In the manuscript we had two data sets: For the first one from Fulco et al. (2019) we took the interactions from their supplementary tables: https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-019-0538-0/MediaObjects/41588_2019_538_MOESM3_ESM.xlsx (Supplementary Table 6a). Unfortunately, only gene names are listed and no IDs, so for comparison we took the gene annotation from that excel-file (Supplementary Table 5b).

The other data set from Gasperini et al. (2019) originates from their GEO: GSE120861. From there we took the file 'GSE120861_all_deg_results.at_scale.txt.gz'. They did one smaller CRISPR-screen as proof-of-concept, and then this larger one as follow up. Since they use Ensembl IDs, we could use the full hg19 gene annotation from GENCODE.

For the plots in the manuscript we could only use the validated interactions which we were able to score, so the numbers there are not the total numbers of interactions that were tested in the experiments. I hope that helps. Let me know, if anything is unclear, or whether there are files missing to which I could point to.