bytedance / Protenix

A trainable PyTorch reproduction of AlphaFold 3.
Other
678 stars 54 forks source link

Question about the MSA datasets #30

Open dohyeonscottkim opened 2 days ago

dohyeonscottkim commented 2 days ago

Hello, thank you for such great work!

I’m interested in testing your model with additional MSA results.

Could you explain why the MSA pairing dataset was changed from uniprot to uniref100?

Also, what dataset did you use for the non-pairing dataset? Is it the same as in AlphaFold 3, such as uniref90, small_bfd, or mgnify?

Kqiii commented 5 hours ago

Hi @dohyeonscottkim, we tested the MSA data pipeline described in the AF3 supplementary information as part of our pilot experiments, using the databases and search tools outlined. However, we found that scaling this pipeline for large-scale MSA searches was too time-consuming given our available computational resources. As a result, we decided to adopt the MSA pipeline used in ColabFold, which is significantly more efficient.