max_length=100 for TruncateProtein in pretrain config files

DeepGraphLearning / GearNet

GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)

MIT License

253 stars 28 forks source link

max_length=100 for TruncateProtein in pretrain config files #29

Closed yunxiaoliCB closed 1 year ago

yunxiaoliCB commented 1 year ago

Hello! Amazing work here. I am curious about a detail of setup of different pretrain tasks specified in the config files (.yaml files). In config of self-prediction tasks, there seems to be a TruncateProtein applied to the AlphaFoldDB dataset with max_length=100, while in config of Multiview Contrast task there isn't. Is similar truncation specified implicitly somewhere else in cases for MC task? Is the truncating using max_length=100 needed to reproduce the results for pretraining on self-prediction tasks? Thank you!

Oxer11 commented 1 year ago

Hi, this is a good question! Actually, we set the max_length to keep an upper bound for the memory usage. For MC, it is implicitly done after using the crop_funcs during pre-training as defined here, so we don't need to truncate it when loading the data. For your second question, I haven't tried to remove the max_length constraint, but I think it doesn't affect the performance if you have a GPU with large enough memory.

yunxiaoliCB commented 1 year ago

Thank you for your kind reply. That makes a lot of sense!