aqlaboratory / proteinnet

Standardized data set for machine learning of protein structure
MIT License
867 stars 132 forks source link

PSSM generation #8

Closed st4cks1defl0w closed 5 years ago

st4cks1defl0w commented 5 years ago

Hello! I've been trying to wrap my mind around PSSM generation from an arbitrary protein taken out of pdb.

After reading the ProteinNet paper and checking out this repo, I'm still confused on the process you use. Right now I assume the pipeline:

FASTA sequence fed to jackhmmer -> esl-weight.

Down from this point it's a bit unclear to me. Are there any opensource utilities I can use to generate a binary-style PSSM (with the additional context columns) utilized by ProteinNet, or are there any arguments to esl-weight that I'm missing?

Thanks! Apologizing in advance if my question has already been discussed somewhere, couldn't find any directions on this.

alquraishi commented 5 years ago

You've probably seen it already but this script contains the commands used to generate PSSMs in ProteinNet.

st4cks1defl0w commented 5 years ago

Yes, figured it out - thanks!