Issues in the methodology

Hello, I was going through your paper and code, and I found some issues:

The paper mentions that the input vector is originally (1476 x 1) dimensional. May I know, what's the logic behind adding another dimension as '0' in the input. In my opinion, the only motive was to reshape the resulting (1477 X 1) vector to (211 x 7). And how does this corresponds to a single-channel image? I believe reshaping any 1-D vector to a 2-D matrix won't make it an image.
Other alternatives to reshape the original 1476 X1 vector could be (738 x 2) or (492 X 3) or (369 x 4) or (246 X 6 ). May I know, if you have tried the above transformations?
The use of 'softmax' as activations in the last layer: The paper mentions that the first network works as an autoencoder. Why not use 'linear' as activations in the last layer since you are trying to regenerate the output. I believe this is not a classification problem.
The original input feature set is of 1476 features. Why make it more sparse to 4096?
For the second network (FRnet predict), why rescale 4096 to (64 x 64)? Rescaling any 1-D matrix to a 2-D matrix doesn't make it an image. In my opinion, this was done again to project CNN as a solution.
As the output of the second network again a vector of length 1077 is mentioned. Why? This output should ideally be the list of drugs available which makes this problem a multilabel classification problem.
Adding to the above point, In my opinion, this problem corresponds to multilabel classification. One enzyme can have multiple drugs and hence a multilabel classification problem. Similarly, not justified to use, categorical cross-entropy as loss and the use of softmax as the activation in the last layer.

Hoping for a good response from your side.

reshaping any 1-D vector to a 2-D matrix won't make it an image but that was done to be able to use conv2d
Yes they were reported in the supplementary section
That was tried but we found better results using the way reported in the paper
1476 features are a lot for traditional machine learning algorithms but for CNNs, more features (up to a certain number) are usually better. We tried numbers such as 512, 1024, and 2048 and got the best result with 4096. We couldn't pick any higher number due to hardware limitation
Same as 1, it was done so conv2d can be used 6+7. We wanted to show our methodology and present that CNN-based methods are very capable of handling Drug target interaction using features that were generated initially for traditional machine learning algorithms

farshidrayhancv / FRnet-DTI

Issues in the methodology #2