Dataset and config used for the paper

aqlaboratory / rgn

Recurrent Geometric Networks for end-to-end differentiable learning of protein structure

MIT License

326 stars 87 forks source link

Yes the parameters for all the models are in the directory you mention. The validation set was pre-selected to exclude all sequences with >700 residues. For the test sets, all proteins in CASP12 and before were shorter than 700 I believe so there was no filtering needed. Having said that, it's possible to change the config file during prediction time to something greater than 700 residues to make predictions, although obviously performance may suffer as the model wasn't trained on longer proteins.
Yes this information can be gleaned from the ProteinNet text-based records, since the entry IDs contain the CASP category. See here for more info.

aqlaboratory / rgn