calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
396 stars 121 forks source link

Akita CTCF perturbation #127

Open shtoneyan opened 2 years ago

shtoneyan commented 2 years ago

For figure 2 in Akita how were the CTCF positions identified? In the methods section it says "To perform in silico motif mutagenesis, we intersected our test set regions with motif positions using bedtools.". It would be helpful to know where these motif positions come from. Is it from the CTCF ChIP-seq output peaks (identified with a peak caller)?

davek44 commented 2 years ago

Hi, the Figure 2 analysis made use of CTCF ChIP-seq in HFF cells from ENCODE. The Methods section refers to Figure 3 analysis, for which we used TF motifs (including CTCF) mapped to HG38 by the JASPAR database. The exact file is here: http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2018/hg38/JASPAR2018_hg38_all_chr.bed.gz.

shtoneyan commented 2 years ago

Hello, thank you for replying so quickly! Sorry, for fig 2 do you mean that the peaks from ENCODE CTCF ChIP-seq in HFF were used? If so was the whole length of the peak region inverted or mutagenized or CTCF motifs specifically were targeted?

gfudenberg commented 2 years ago

Hi all, Fig 2b showing the tracks indeed made use of the encode ChIP data-- however that was just for visualization. For Fig2d and Fig2e, the JASPAR CTCF motif positions were used, as in Fig3. Fig2d is basically a visual example of what is quantified in Fig3. For Fig2e, motif positions were extended by +/- 10bp and intervals were merged (so that very closely neighboring motifs were inverted together). For Extended Data 6e, the whole region reported as a peak by ENCODE was mutagenzied. Hope that helps!

gfudenberg commented 1 year ago

Hi-- these are motif positions from the JASPAR2018 database (see citation in paper). Hope that helps!

On Thu, Jul 21, 2022, 8:05 AM shtoneyan @.***> wrote:

For figure 2 in Akita how were the CTCF positions identified? In the methods section it says "To perform in silico motif mutagenesis, we intersected our test set regions with motif positions using bedtools.". It would be helpful to know where these motif positions come from. Is it from the CTCF ChIP-seq output peaks (identified with a peak caller)?

— Reply to this email directly, view it on GitHub https://github.com/calico/basenji/issues/127, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEV7GZOWOXR4B2PQNZMUJODVVFRMLANCNFSM54H7VFOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>