Closed DuanhaoranCC closed 1 year ago
Hi, thanks for the question.
I use the AlphaFoldDB for pre-training, since it contains more available structures than PDB. We agree that "the larger the dataset, the better the results" holds when the number of structures increase with a scale of 100 times. This may not hold for twice more structures. Also, the pre-training results depend on the model capcity.
Hello, you discussed the results of pre-training on different datasets in the appendix. As we can see in Table 8, the performance is comparable with real PDB or alphafold (V1 or V2), but real PDB has only 300,000 structures and alphafold has 800,000 structures. Why the authors use more structures of alphafold in the main text? Finally, theoretically, the larger the dataset, the better the pre-training results, why Table 8 is not valid?