google-deepmind / deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications
Apache License 2.0
13.19k stars 2.6k forks source link

Enformer protocol #287

Open BenxiaHu opened 3 years ago

BenxiaHu commented 3 years ago

Hello, Enformer looks great to predict gene expression. Is there a piprline to run Enformer?

I have another question. I have many DNA sequences with different length. Now I want to know whether Enformer would be able to predict the target genes for my DNA sequences.

Best,

Avsecz commented 3 years ago

Hi,

thanks for reaching out. Currently, there is no pipeline/script to run the Enformer. I suggest implementing the pipeline yourself by extracting the relevant code from the enformer-usage colab.

The TF-hub version expects an input of size 393,216 base pairs and can also deal with unknown nucleotides - N's (represented as [0, 0, 0, 0]. To deal with variable-length sequences, I would recommend placing the main TSS for the gene in the middle of the sequence and then pad the rest of the sequence on each side with N's to reach 393,216 base pairs (or trim if too long). Note that Enformer makes prediction for the central 114,688 bp at 128 bp resolution so the result will contain 892 spatial values. You can extract transcript expression values by extracting values at the spatial bin (out of 896) overlapping the TSS of the transcript of interest.

Best Ziga