Closed yunxiaoliCB closed 1 year ago
Hi, this is a good question! Actually, we set the max_length
to keep an upper bound for the memory usage. For MC, it is implicitly done after using the crop_funcs
during pre-training as defined here, so we don't need to truncate it when loading the data. For your second question, I haven't tried to remove the max_length
constraint, but I think it doesn't affect the performance if you have a GPU with large enough memory.
Thank you for your kind reply. That makes a lot of sense!
Hello! Amazing work here. I am curious about a detail of setup of different pretrain tasks specified in the config files (.yaml files). In config of self-prediction tasks, there seems to be a TruncateProtein applied to the AlphaFoldDB dataset with max_length=100, while in config of Multiview Contrast task there isn't. Is similar truncation specified implicitly somewhere else in cases for MC task? Is the truncating using max_length=100 needed to reproduce the results for pretraining on self-prediction tasks? Thank you!