ehsanasgari / DeepPrime2Sec

Apache License 2.0
27 stars 12 forks source link

I do not understand the structure of the dataset you use #18

Open SaidaSaad opened 3 years ago

SaidaSaad commented 3 years ago

Hello

Thank you very much for the code. The code is working fine. I have my own dataset so I would like to know how did you preproces the dataset becaue it is not clear to me ?. But I do not understand the dataset and the feature you calculates. First you use single sequence of amino acid . In train.txt , Is the empty line mean end of sequence for example the length for the first sequence is 12, Is that right ?. or i did read the dataset in wrong way

Also in X_tarin_408.npy , what 408 refer to, how did you calculate that file from the dataset in train.txt. for examaple for the first sequence in train.txt with length 12 , you generate array with shape [700,408], how did you create that ?

Also in train_map_Y.npy , you generate for every sequence an array with [700,9] why 9 .

I hope you can reply me , I am struggling with that ?

Thanks | Best Regards Saida