Open yuanhu246 opened 3 months ago
It is pre-trained on CATH. CATH is a dataset that contains both sequences and realistic structures.
To pre-train with customized data (e.g., CATH or AlphaFoldDB datasets), you can refer to the steps described in the README.
Download the CATH dataset from the official website (https://www.cathdb.info/).
Pre-process pre-training PDB files as done in ./raw_data/data_process.py and transform into three files:
Load pre-processed data and perform pretraining on it.
Thank you very much for your help. Now I have downloaded the PDB file. Could you please help me to check whether the PDB file I downloaded is correct? And the corresponding protein. {}. Sequences. Dictionary. CSV and protein actions. {}. TXT should go where to download?
Dear author, you have a pre-trained model on github, on which dataset was this model pre-trained? In your paper, you mentioned using the CATH dataset for pre-training. I think it is an interesting dataset, but I am new to the bioinformatics field and am not familiar with the CATH dataset. How to download the CATH dataset and use it in your model? Please don't hesitate to give your advice