Enformer protocol - Githubissues

Hi,

thanks for reaching out. Currently, there is no pipeline/script to run the Enformer. I suggest implementing the pipeline yourself by extracting the relevant code from the enformer-usage colab.

The TF-hub version expects an input of size 393,216 base pairs and can also deal with unknown nucleotides - N's (represented as [0, 0, 0, 0]. To deal with variable-length sequences, I would recommend placing the main TSS for the gene in the middle of the sequence and then pad the rest of the sequence on each side with N's to reach 393,216 base pairs (or trim if too long). Note that Enformer makes prediction for the central 114,688 bp at 128 bp resolution so the result will contain 892 spatial values. You can extract transcript expression values by extracting values at the spatial bin (out of 896) overlapping the TSS of the transcript of interest.

Best Ziga

google-deepmind / deepmind-research

Enformer protocol #287