arnavmdas / epiphany

MIT License
10 stars 2 forks source link

ATAC-seq vs DNaseI-seq #4

Closed jingquanlim closed 6 months ago

jingquanlim commented 1 year ago

Dear authors,

As ATAC and DNaseI assess open regions of the genomes, would a model trained on ATAC+(other epigenetic 1D maps mentioned in the Epiphany paper) be better suited than DNaseI for the prediction of HiC contact maps?

I am very interested in using Epiphany to predict contact maps in different subtypes of B lymphocytes (which GM12878 is). However, the high DNA input required by DNaseI-seq is limiting; 10-50 million cells. Whereas, ATAC-seq needs just 50k and can also go down to scATAC-seq scalings too.

I presume the main model presented in the paper would also perform great with ATAC instead of DNase too. However, with limited computing resources, my side can generate the ATAC-seq data for GM12878 BUT cannot train it, together with the other epigenetic 1D maps, for a fine-tuned/revised model.

I was also desperate to just fit the ATAC-seq bigwig file, as a pseudo-DNaseI file, for prediction on your Google Collab python notebook too. Maybe I will go ahead with this and see if the predictions differ much from the original runs that used DNaseI-seq.

BTW, I was wondering if your side has a model trained on (ATAC,CTCF, H3K27ac, H3K27me3, H3K4me3) that is available for the community to use too. Thanks much for reading thru this lengthy post!

ruy204 commented 12 months ago

Hi Jingquanlim,

Thank you for your interest and sorry for the late response!

Yes using ATAC-seq instead of DNaseI is very much recommended. The major goal of Epiphany is to build a bridge between the epigenomic signals and the 3D chromatin structures, where people could explore the effects of epigenomic perturbations to the 3D structure changes, or use feature attribution with a well-trained model to identify specific region of interest for biological discoveries. We encourage people to play with the model and retrain with various combinations of epigenomic tracks as input (as long as they're stored in the typical .bw format). In the paper we have trained the model with an example input epigenomic set, but indeed, a customized set of epi tracks could also be used to make predictions.

For the check points, we have multiple different combinations, including

Thank you, Rui

jingquanlim commented 12 months ago

Dear Rui,

Thanks much for your reply! Really appreciate it.

It will definitely be great to retrain the model but we are very short on the input cells to generate the necessary Hi-C or Micro-c seq data/maps for a proper retraining of an 'updated' model. We are still trying to generate enough cells of interest, based on B lymphocytes, for conformation profiling using Epiphany.

Based on the ablation analysis in the paper, it seems that 'original' minus DNaseI performs very well too. Hence, I was thinking of generating "H3K27ac

If I can have the ckpt of "H3K27ac + H3K27me3 + H3K4me3 + CTCF", then I start 'experimenting' with predicting from such a ckpt using 4 'pseudo' tracks too. Thanks again!

Regards, Jing Quan

On Fri, Sep 8, 2023 at 10:36 PM Rui Yang @.***> wrote:

Hi Jingquanlim,

Thank you for your interest and sorry for the late response!

Yes using ATAC-seq instead of DNaseI is very much recommended. The major goal of Epiphany is to build a bridge between the epigenomic signals and the 3D chromatin structures, where people could explore the effects of epigenomic perturbations to the 3D structure changes, or use feature attribution with a well-trained model to identify specific region of interest for biological discoveries. We encourage people to play with the model and retrain with various combinations of epigenomic tracks as input (as long as they're stored in the typical .bw format). In the paper we have trained the model with an example input epigenomic set, but indeed, a customized set of epi tracks could also be used to make predictions.

For the check points, we have multiple different combinations, including

  • original input: DNaseI + H3K27ac + H3K27me3 + H3K4me3 + CTCF
  • H3K27ac + H3K27me3 + H3K4me3 + CTCF
  • H3K37ac + CTCF
  • ATAC + H3K27ac + H3K27me3
  • ATAC + H3K36me3 + H3K27ac + H3K27me3
  • etc Please let us know which ckpts could be useful for you, and we are also happy to help retrain the model with new combinations.

Thank you, Rui

— Reply to this email directly, view it on GitHub https://github.com/arnavmdas/epiphany/issues/4#issuecomment-1711776649, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOEMXXE7DDZ7RVGEE4F4GTXZMUQPANCNFSM6AAAAAA4EZMEPE . You are receiving this because you authored the thread.Message ID: @.***>