1edv / evolution

This repository contains the code for our manuscript - 'The evolution, evolvability, and engineering gene regulatory DNA'
MIT License
93 stars 26 forks source link

What is the rc_Conv1D block about? #7

Closed vuhongai closed 2 years ago

vuhongai commented 2 years ago

Hi, Thank you for sharing your awesome work. I have difficulty to understand the intuition of the use of rc_Conv1D block in the attention-based model, since it is new for me. If I understand correctly, you're trying to mimic the DNA structure with forward and reverse stand, right? I have tried with a dataset of protein sequence (just modify the function to generate onehot vector) to function, and the architecture works also very well (figure: validative set of the training). image

So my question is, what exactly this rc_Conv1D block does in other case rather than DNA sequence? Thank you. Ai

Carldeboer commented 2 years ago

Hi Ai

The idea of that function is to incorporate the reverse complement DNA strand since the strand is arbitrary and transcription factors bind in either orientation. It is more efficient for a variety of reasons to reverse complement the convolutional filters rather than the input DNA sequences. So that layer uses reverse complemented filters to do a 1D convolution. The reverse complement actually works by flipping the filters along the length and base axes (and the bases have to be in a specific order so that the flip actually represents complementation; so ACGT works since TGCA is its complement, but ATGC does not since CGTA is not the complement of ATGC).

Are your protein sequences a one hot encoding of amino acids? e.g. 20xL (L=length; 20 - one per amino acid)

If so, I cannot explain why the rc_Conv1D would help as it really doesn't make sense for that application. There is no logical meaning to a reverse complement of a protein sequence. Have you tried the same model, but without the rc_Conv1d? If not, perhaps it is the rest of the model that is helping.

vuhongai commented 2 years ago

Thank you for your considerate response. And yes you're right, if I replace with normal Conv1D, it works as well. Anyway, thank you for sharing the architecture, it learns very robustly.

Ai

1edv commented 2 years ago

Thanks so much for your question @vuhongai! I am just adding a couple points to @Carldeboer's excellent answer:

Best, Eeshit

vuhongai commented 2 years ago

Thank you for your clarification and the suggestions. I will look into it. Bests, Ai