Rappsilber-Laboratory / AlphaLink2

AlphaLink2: Integrating crosslinking MS data into Uni-Fold-Multimer
Creative Commons Attribution 4.0 International
42 stars 11 forks source link

No crosslinks satisfied #9

Open roivant-matts opened 11 months ago

roivant-matts commented 11 months ago

Hello, I explored al2 with some of the test cases in the paper (Rpoa-Rpoc) with good results. With our own xl data I do not get any crosslinks satisfied (multimer prediction, v2 network, 3 conditions, each condition 20-30 xls). My impressions was that at least some of the crosslinks would be satisfied - no? I added FDR 0.05 to all xls.

edit: when I measure distances between xl residues of the prediction, I do see several are < 25A (I am measuring residues which may not be from Ca to Ca, but still)

roivant-matts commented 11 months ago

I think this may be due to crosslinks that are provided as input as both A B and B A. When I normalize B A to be A B before generating the dictionary, I am getting the expected crosslink satisfaction (I believe it's the same prediction).

lhatsk commented 11 months ago

I see that we can run into issues if you have a mix of A B and B A crosslinks. It should be fixed now. Thanks for reporting it!

Note that at the moment we only report the inter-protein crosslink satisfaction (CA-CA). I should clarify it or report both.

roivant-matts commented 11 months ago

Great thanks, yes I also confirmed after more testing but was slow to write back. That makes sense as the % of crosslinks satisfied was still lower than expected when measuring. (not critical since we will generate a xl satisfaction report so we can see the satisfaction by xl input for the best model)

edit: is it possible to describe how FDR has an impact? When I used a constant value across xls it does not seem to. We are exploring to use the upstream data to better assign confidence - should we expect this to have more of an impact if the FDR varies within?

And finally, is there a way to set a given crosslink as a true restraint (I take it this was what AL1 did?).

edit2: one more bug I believe I encountered - if only B-B crosslinks, the output shows similarly to no crosslinks provided.

lhatsk commented 11 months ago

edit: is it possible to describe how FDR has an impact? When I used a constant value across xls it does not seem to. We are exploring to use the upstream data to better assign confidence - should we expect this to have more of an impact if the FDR varies within?

The FDR is included as a bias to allow the network to better weigh the information. It's rather hard to determine the impact of the FDR because there are many different factors at play (e.g., the co-evolutionary information) that influence the "final" weight/ likelihood of a crosslink. However, we have seen that with higher FDRs (ie 20%) the network gets a little more cautious which may result in lower crosslink satisfaction.

And finally, is there a way to set a given crosslink as a true restraint (I take it this was what AL1 did?).

No, it's currently not possible to enforce a constraint. The v3 network has seen a very small amount of crosslinks with FDR 0 but my guess is setting the FDR to 0 for individual links will have a rather limited effect. In AL1 it is possible to force constraints to some degree with the distogram network.

One thing that you could try, which is a little hacky but worked well for me with AL1, is to poke holes into the MSA for the crosslink you want to enforce. Say you have a crosslink at A i B j 0. Zero-out the MSA for A.feature.pkl.gz at (i-1) +- 2 and B.feature.pkl.gz (j-1) +- 2; same with uniprot.pkl.gz

edit2: one more bug I believe I encountered - if only B-B crosslinks, the output shows similarly to no crosslinks provided.

I fixed a bug where homomeric crosslinks with small sequence separations (< 6AA) were skipped. Was this maybe the issue? Otherwise, it works fine for me.

gabrieliacc commented 10 months ago

Hi all! Thank you for AlphaLink2, it is very useful tools for modeling.

I had some problems for satisfying XL distances. I have a trimmer system A-2B with one XL linking structured regions and other two linking IDR with a structural one.

In the first test using the complete system, I did not obtain the expected XL distances (XL distances ~ 50 A). So, I simplified the system using a dimer system A-B with only one XL (between structured regions). In a second test on the dimer system, I did zero-out the MSA on the features.pkl.gz and uniprot.pkl.gz files. In a third test, I also tested the alphalink2 cut-off. In all the test, I obtained very similar interfaces, with CA-CA distances close to 50 A.

I would like to know if even forcing the XL through the removing the MSA information for the residues in the XL, it is possible not to satisfy the distance of 25 A in the XL. Is there some you suggest to test? Could you helping me?

Thank you in advance

Gabriel

lhatsk commented 10 months ago

Hi Gabriel,

Yes, it's possible that removing the MSA information for these particular residues doesn't force the constraint. So far I have only tested it with the distogram network in the monomer version of AlphaLink. That network was trained in a different way, which makes it possible in some cases. In general, since AlphaLink is integrative, it always takes into consideration all of the information (sequence, MSA, template, crosslinks), the other information might simply overpower the crosslink information. To truly force constraints, it would need to be enforced in the loss during training.

Do your results vary between networks? The v3 network might work a little better for forcing constraints if you supply an FDR of 0. You could also try to increase the window size for removing the MSA information, e.g., up to +- 3 residues.

What do you mean by you also tested the alphalink2 cut-off?

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case.

Your expected distance is < 25A?

Removing the disordered parts was a good idea, AlphaFold and by extension, AlphaLink struggle a lot with this.

gabrieliacc commented 10 months ago

Thank you for the quick answer!

Yes, it's possible that removing the MSA information for these particular residues doesn't force the constraint. So far I have only tested it with the distogram network in the monomer version of AlphaLink. That network was trained in a different way, which makes it possible in some cases. In general, since AlphaLink is integrative, it always takes into consideration all of the information (sequence, MSA, template, crosslinks), the other information might simply overpower the crosslink information. To truly force constraints, it would need to be enforced in the loss during training.

Right! In the case on having just only inter-chain XL, when no XL was satisfied, could this suggest that the XL have a low probability to occur?

Do your results vary between networks? The v3 network might work a little better for forcing constraints if you supply an FDR of 0. You could also try to increase the window size for removing the MSA information, e.g., up to +- 3 residues.

I perform the the following tests with only one XL with the full sequence: -v2 -v3 -v3 with zero-out up to +-3

When I zero-out the MSA residues. I run the MSA, I zero-out the select residues and then I run the inference.py.

What do you mean by you also tested the alphalink2 cut-off?

The inference.py have a option "--cutoff". I meant that option.

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case.

Your expected distance is < 25A?

Yes, I expected that. Does it make sense?

Removing the disordered parts was a good idea, AlphaFold and by extension, AlphaLink struggle a lot with this.

I did: -v2 without IDR -v3 without IDR -v2 without IDR and with zero-out up to +-3 -v3 without IDR and with zero-out up to +-3

I all cases the XL distance is longer that 40 A.

Do you have any other suggests ?

Thank you in advance

lhatsk commented 9 months ago

Hi,

Sorry for the late response!

Right! In the case on having just only inter-chain XL, when no XL was satisfied, could this suggest that the XL have a low probability to occur?

What does the prediction look like, just two chains floating in space? If the XL don't have any support in the MSAs, it might be hard to satisfy them. The distogram network allows to overconstrain in this case, it usually helps to bring the structures closer, but still might not be enough to build a proper interface. I will try to upload the network in the next two weeks.

The inference.py have a option "--cutoff". I meant that option.

This option only changes the cutoff of the satisfaction computation, but doesn't affect the actual prediction.

We will hopefully release a distogram network for AlphaLink2 soon-ish which might work better for your use case. Your expected distance is < 25A?

Yes, I expected that. Does it make sense?

Yes, the networks expect < 25A.

I all cases the XL distance is longer that 40 A.

Do you have any other suggests ?

Only to try again once the distogram network is uploaded and then overconstrain maybe with 10A.

Samuel-gwb commented 7 months ago

I met similar case that no crosslink was satisfied. My question is how to poke holes, or zero-out the MSA at specific positions for pkl.gz? Thanks !

One thing that you could try, which is a little hacky but worked well for me with AL1, is to poke holes into the MSA for the crosslink you want to enforce. Say you have a crosslink at A i B j 0. Zero-out the MSA for A.feature.pkl.gz at (i-1) +- 2 and B.feature.pkl.gz (j-1) +- 2; same with uniprot.pkl.gz

Samuel

lhatsk commented 6 months ago

Sorry, I haven't automated it. You would need to load and manipulate the feature files (same for uniprot). E.g.,

A = pickle.load(gzip.open('A.feature.pkl.gz','rb')) B = pickle.load(gzip.open('B.feature.pkl.gz','rb'))

if you have crosslink A 5 B 10 you should put gaps at these specific positions in the MSA (gap = 21). Something like this:

A['msa'][1:,5-1] = 21 B['msa'][1:,10-1] = 21

Usually good to also put gaps in the other surrounding areas, e.g., +- 2 residues.

How many crosslinks do you have and what crosslinker are you using? It's sometimes hard to overturn the MSA if you have insufficient crosslink density.

Samuel-gwb commented 6 months ago

Thanks !

How many crosslinks do you have and what crosslinker are you using? It's sometimes hard to overturn the MSA if you have insufficient crosslink density.

I have tens of crosslinks between two subunits among four. As they did not work, I times them with 9, meaning that a CX of i-A to j-B was increased to 9, including each (i-1, i, i+1)-A to each (j-1, j, j+1)-B. However, the two subunits were still far from each other.

lhatsk commented 6 months ago

What's the expected distance of your crosslinks? If there is no support in the MSA, sometimes all the network can do is bring them closer to the boundary (~25 A).

Samuel-gwb commented 6 months ago

The expected distances are different, 10 A ~ 35 A. I've previously got a model from AF2.2-multimer using no crosslinks, with low confidence, that satisfies most of the crosslinks. AF2.3-multimer predicts a looser model. Then I tried AL2 using these crosslinks to see if I can got a better model, getting results we are disscussing about.

lhatsk commented 6 months ago

Is there a difference between the v2 and v3 networks for AlphaLink2?