Lehcar / protein-secondary-structure-predictor

Predicting protein secondary structure, for a school project
0 stars 0 forks source link

Need Help #1

Closed rhmankad closed 1 year ago

rhmankad commented 4 years ago

Hello

Your repository is very helpful in my work. I want to know about dataset. From where you get this dataset? can you share the link or name of dataset? how you have done preprocessing of dataset?

Lehcar commented 4 years ago

Hi, This was a project for an undergraduate class last semester. My teammate was the one who chose which proteins we were working with. I believe she chose the proteins due to their relation with breast cancer (this was based off her knowledge). She obtained the data from the Protein Data Bank and used YASPIN to get the secondary structure. We were trying to base it off of the Holley and Karplus's 1989 paper, but unfortunately I didn't have enough time to code the sliding window. For preprocessing, I believe it's just one hot encoding for each of the bases. The dataset.txt file is the file we used to run the program. Let me know if you have any other questions.