Hughes-Genome-Group / deepC

A neural network framework for predicting the Hi-C chromatin interactions from megabase scale DNA sequence
GNU General Public License v3.0
33 stars 12 forks source link

Actual Model Architecture for 5kb GM12878 model #3

Closed stasys-hub closed 1 month ago

stasys-hub commented 2 years ago

Thank you for this awesome model!

Out of curiosity I am trying to reimplement the 1D DNA convolutional part of the neural network in pytorch but found it a bit hard to understand the "log output". Therefore i wanted to ask whether it would be possible to clarify some of my asssumptions?

When looking at the architecture I would say the first part until the dilated convolutions is clear. Assuming the Pytorch convention of (N,Cin​,L), where N denotes the batch size, Cin the number of channels and L the length of the vector we would have an input of shape : [N, 4, 1_005_000]. Assuming a "same" padding (Could not find anything in the log or the original publication) we would get the following sheme:

[N, 4, 1_005_000] # input [N, 300, 1_005_000] # layer 1 out [N, 300, 251250] # pool 1 out [N, 600, 251250] # layer 2 out [N, 600, 50250] # pool 2 [N, 600, 50250] # layer 3 out [N, 600, 10050] # pool 3 out [N, 900, 10050] # layer 4 out [N, 900, 2010] # pool 4 out [N, 900, 2010] # layer 5 out [N, 900, 1005] # pool 5 out [N, 100, 1005] # layer 6 out [N, 100, 1005] # pool 6 out

Now this part seems to be followed by 9 dilated 1d convolutions (10 are mentioned in the paper). Dimensions therefore stay the same and this is followed by a fully connected layer mapping to the HiC Skeleton of dim [N, 201]. EDIT: And this last part seems to be a bit confusing to me. Given that the last output dimension is [N, 100, 1005] that would give flattened [N, 100_500] a really big tensor right before the last fully connected layer. Am I missing something?

I hope I understood everything else right, and if that's not the case I would be really glad if you could clarify on that if possible!

Thank you very much in advance!

Architechture: Hidden Units Layer 1: 300 Kernel Width Layer 1: 8 Max Pool Layer 1: 4 Hidden Units Layer 2: 600 Kernel Width Layer 2: 8 Max Pool Layer 2: 5 Hidden Units Layer 3: 600 Kernel Width Layer 3: 8 Max Pool Layer 3: 5 Hidden Units Layer 4: 900 Kernel Width Layer 4: 4 Max Pool Layer 4: 5 Hidden Units Layer 5: 900 Kernel Width Layer 5: 4 Max Pool Layer 5: 2 Hidden Units Layer 6: 100 Kernel Width Layer 6: 1 Max Pool Layer 6: 1 Dilatio Scheme: 2,4,8,16,32,64,128,256,1 pp Dialtion Units: 100 Dilation Width: 3

rschwess commented 6 months ago

Hi stasys-hub,

Sorry for the WAY late reply.

You got everything correct, the last layer has indeed a large input tensor, this is how the model ends up with 10M + parameters.

I am sure one could come up with a smarter way to make that more parameter efficient nowadays.

Cheers