DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
MIT License
263 stars 27 forks source link

About Dataset #39

Closed DuanhaoranCC closed 1 year ago

DuanhaoranCC commented 1 year ago

Hello Dear Author! For the dataset in the experiment, we have the following confusions:

  1. for Enzyme Commission dataset, I downloaded the dataset, but only get the PDB file, the PDB index of the training set. But how do I get the labels? I guess the suffix of PDB stands for label? For example 2FOR-A stands for A? Same for Gene Ontology (GO).
  2. alphafold dataset why there is training set test set validation set?

By the way, I tried using torchDrug, but had a slightly different experience than PyG.

Oxer11 commented 1 year ago

Hi, thanks for your questions.

  1. I think the file nrPDB-EC_annot.tsv included in the .zip file contains the labels and the details of dataset loading can be found here. For 2FOR-A, the A stands for chain A of 2FOR, which I have extracted from the original PDB file.
  2. There is no validation and test sets for AlphaFold dataset. When pre-training, we pass None to core.Engine as a placeholder.
DuanhaoranCC commented 1 year ago

Thanks, I didn't find an alphafold dataset for pre-training. Can you provide it?