Rappsilber-Laboratory / AlphaLink2

AlphaLink2: Integrating crosslinking MS data into Uni-Fold-Multimer
Creative Commons Attribution 4.0 International
46 stars 14 forks source link

More details about training process #15

Open Li-dacheng opened 10 months ago

Li-dacheng commented 10 months ago

Can you provide more details about your training, such as the source of the training data? Is it necessary for the crosslink data to represent a 25A distance between two proteins in a complex? Also, could you share the training script?

Thank you for your work; it's truly a significant breakthrough.

lhatsk commented 10 months ago

We trained on proteins from the DIPS data set. You can find more details in the supplement: https://www.biorxiv.org/content/10.1101/2023.06.07.544059v2.supplementary-material

We trained on SDA data (~25 A) because most of our real data stems from SDA. We also tested it with real DSSO data (~30 A) which worked in many cases. There is also a network trained on photoAA crosslinking data (10 A). We haven't released the distogram networks yet which would work with arbitrary cutoffs.

For the training script, use the original Uni-Fold script: https://github.com/dptech-corp/Uni-Fold/blob/main/train_multimer.sh