NMRLipids / NMRlipidsIVPEandPG

NMRlipids IV project, PE and PG lipids
GNU General Public License v2.0
0 stars 7 forks source link

Lipid conformations in PDB #40

Closed ivan-gushchin-mipt closed 3 years ago

ivan-gushchin-mipt commented 3 years ago

Hi all,

I've looked at the analysis of dihedral distributions from lipid structures in PDB and would like to comment on that. My point is that many of the lipid conformations in the PDB models are not reliable.

In X-ray crystallography, the molecular model is an interpretation of the experimental data. The primary data are the scattering intensities and, to a lower degree, electron density maps (since experimentally determined phases are not always available). The quality of the model depends on the resolution and quality of the diffraction data, and also on the person who builds the model.

(i) Most of the membrane protein crystals diffract poorly (compared to the crystals of soluble proteins). The resolution of CryoEM density maps is also usually low at the moment.

(ii) Whereas proteins are more or less ordered and structured, the lipids are flexible and/or have partial occupancy. Consequently, even at the same resolution, lipid models are usually less reliable than the protein models from the same crystal.

As a result, many of the lipid conformations in the PDB models are not reliable.

I've checked several structures manually, and here are some examples:

PDB ID 5c5x: the lipid is at the 4x symmetry axis; the density is poor; the serine moiety does not fit into electron density. Overall, I believe, position and identity of the lipid is very unreliable.

image

PDB ID 4ret: lipid headgroups are fitted somewhat arbitrarily into the blobs of density.

image

PDB ID 2hj6: the paper explicitly focuses on the lipids pubs.acs.org/doi/10.1021/bi062154i, but the density in the headgroup region is still very unreliable, as can be seen in figures 1, 3, 5.

image

You can check the densities yourself using the PDB viewer http://www.rcsb.org/3d-view/ngl/5c5x etc.

I think this issue can be addressed in the following ways: 1) manual curation of all lipids - doable, but laborsome (several working days at least). Also, I'm not sure that there will be enough lipids left for smooth distributions after this. 2) throw away low resolution models such as below 3 angstrom - easy, but there is no guarantee that other lipids are good. 3) and/or analyze overall headgroup orientation (P-N vector etc) instead of particular dihedrals.

I'll be glad to help with this if needed (maybe also with @pbuslaev's help)

Some minor remarks:

markussmiettinen commented 3 years ago

Hi Ivan,

thank you, I find this extremely interesting!

By "the first lipid" it is meant that only the first lipid occurring in the pdb record was taken. This is because the very same conformation was often just copied several times in the record, and thus different lipids from the same record can not be taken as independent samples.

This is related to your suggestion 1. manual curation of all lipids. Namely, if we can get one good structure per record, we will have the quality now seen in Fig 4, which is not awesomely smooth, but is at least suggestive of conformations that do not occur in the pdb. My question is, in your opinion, could a manual curation give one good structure per pdb record?

ivan-gushchin-mipt commented 3 years ago

Regarding the first lipid, quite often there are several copies of the protein present in the PDB file, and each monomer/protomer binds lipids similarly, and then the lipid conformations are the same. In this case I guess the repeating lipids from this PDB record can be discarded. However, if the lipids bind at different sites, their conformations are likely to be different, and then all this information can be collected. If the conformations are the same in the different sites, this probably means that the resolution is low, the model was not refined, and the conformation is not reliable.

Regarding the manual curation, as you can see in the figures above, it is quite possible that some low resolution structures lack any reliable lipid conformations at all - the overall position may be correct, but the dihedrals may be wrong. I do not know the resolution statistics in the assembled set of structures, but from my experience the resolution worse than 3.5 A should not be helpful at all, and anything worse than 3.0 A is also likely not very helpful.

In our own structures, we usually see lipid tails (such as this https://www.rcsb.org/ligand/LFA - the structures can be viewed by clicking in the link "is present as a standalone ligand in 71 entries"), but not the headgroups - they are usually disordered, unfortunately.

ohsOllila commented 3 years ago

Thanks for the comment.

The lipid structures in PDB may indeed be unreliable and I would use individual structures very carefully. Issues in lipid structures in PDB have been also discussed by Derek Marsh et al. for example in these papers: https://doi.org/10.1007/s00249-012-0816-6 and https://doi.org/10.1016/j.bpj.2018.02.016. In addition to the mentioned problems, the structures in PDB are often refined using the same potentials (force fields) as used in MD simulations. Therefore, some structures are not fully independent on the MD simulations.

Nevertheless, the set of lipid structures in PDB is the best available set of information on protein bound lipid structures. As you mentioned, lipid headgroups are typically not seen in the structures because they are disordered (which is in line with our conclusions from MD simulations in bulk bilayers), but sometimes they are seen. I believe that these represent the situation where lipid is tightly bound to protein and that we should utilize this information even thought it is not perfectly accurate. I also believe that the high resolution information on lipids in PDB is rapidly increasing due to the development of cryo-EM which is especially useful for membrane proteins.

Current analysis is done without any curation on the structural quality, because I wanted to do it fast, I was not sure what would be the best curation criteria, and also to maximize the amount of data. The numbers of found structures with the current criteria are: 176 PC, 198 PE, 70 PG, and 41 PS lipids. This may still increase if new structures have been added to PDB when I run the script again. Manual curation may still be doable with this amount, but it will be tedious. In addition, it may be subjective and the curator(s) might prefer results which supports his/her perceptions on how lipids should bind to proteins. However, I think that we should at least check how much some kind of quality screening would affect our results.

I think that checking the resolution of the used structures would be a reasonable first step (despite the mentioned weaknesses). This should be rather straightforward using the PDB API because the resolution is in the indexing parameters (here is the tutorial notebook that I modified to make my analysis: https://github.com/PDBeurope/pdbe-api-training/blob/master/api_tutorials/6_PDB_search.ipynb). There are actually 'pivot_resolution' and 'resolution'. I do not know what these mean exactly but I would use 'resolution' unless someone advices otherwise. We can then use either upper boundary for accepted resolution or weight the structures based on the resolution (do you think that this would make sense?). I will try to play around with this asap.

Most sophisticated solution would be to download also the available raw data sets and make the quality evaluation speficially for lipid structures (following the normal PDB quality evaluation procedure when applicable), but I think that this may be too much work for now.

Here are some replies to specific comments:

and/or analyze overall headgroup orientation (P-N vector etc) instead of particular dihedrals.

Our aim here is to go beyond overall headgroup orientations so I would try to push more detailed analysis despite of the problems.

However, if the lipids bind at different sites, their conformations are likely to be different, and then all this information can be collected. If the conformations are the same in the different sites, this probably means that the resolution is low, the model was not refined, and the conformation is not reliable.

There are also large cryo-EM structures in PDB where the determined structure including lipids is copied due to symmetries. I believe that in these cases the structure may have high resolution, but should not be considered more than once. Lipids in different binding sites could be (and should be) taken into account, but we would need a reliable script to automatically separate these from the repeats which I did not have time to do.

would be nice to have a (supplementary) table of used PDB IDs, lipid IDs, lipid chains/resids, lipid designations (PC/PE/PG/PS), although this information can be gathered from your script of course (scripts/pdbSEARCH.ipynb).

The used ligand names for lipids (which I collected manually) are listed in the methods section and naturally also in the script. The script is currently finding 485 PDB IDs. I am not sure it makes sense to list all of these.

ivan-gushchin-mipt commented 3 years ago

I think I agree with everything said above.

Regarding the force fields, yes, this is also an important issue, although I believe that they are usually different from those used in simulations, and I believe are less validated, although generally reliable.

Regarding the resolution, perhaps @pbuslaev or somebody else will make a histogram of resolution distribution so that we can see how many structures (lipid conformations) will be left after some cutoff.

Regarding the weights, as you can see from PDB ID 4ret above, sometimes the conformations are not reliable at all. I guess using the upper boundary for accepted resolution is the easiest approach. The boundary can be chosen based on common (crystallographical) sense and the histogram that I mention above.

Regarding the duplicate conformations, I will also discuss with @pbuslaev. Some possible approaches are:

pbuslaev commented 3 years ago

Hi,

here is the resolution distribution. As you can see, more than half of the structures have resolution higher than 3 angstrom. It is better to plot number of lipids as a function of resolution, but making this script might take longer.

image

I would also like to note that some of the structures used in the analysis were obtained with NMR (e.g. PDB ID 2MLS). I doubt that lipids could be resolved with NMR.

ohsOllila commented 3 years ago

I would also like to note that some of the structures used in the analysis were obtained with NMR (e.g. PDB ID 2MLS). I doubt that lipids could be resolved with NMR.

I also initially considered discarding NMR structures, but it may possible that they actually see NOE peaks between lipids and proteins and determine the structure from these. However, I have not checked if this is the case in the structures that come from NMR. Did you check how many structures were from NMR?

pbuslaev commented 3 years ago

Yes, I checked. There are 18 NMR structures. And here are the pdb codes for all of them

'2mls', '6cc9', '6cm1', '2lya', '2msc', '2mzi', '2mlr', '6w4e', '6clz', '2mzh', '6ptw', '2mse', '6w4f', '6ccx', '6cch', '2msd', '6pts', '2lyb'

ivan-gushchin-mipt commented 3 years ago

I've checked the NMR structures, and the results are as follows:

2lya, 2lyb - "A subset of unambiguous intermolecular NOEs have been observed between the 2′-acyl chain of di-C8-PC and the hydrophobic side chains of ...", "No detectable NOE cross-peaks were observed between MA and the glycerol group, polar head, or 1′-acyl chain of either di-C8-PC or di-C8-PS lipids"

2mlr, 2mls result from PRE-based distances to nitroxide spin-labelled PC

2msc, 2msd, 2mse result from PRE-based distances to lipids conjugated to the paramagnetic ion gadolinium (Gd3+)

2mzh, 2mzi result from PRE-based distances to nitroxide spin-labelled PC and choline hydrogens to protein NOEs

6cc9, 6cch, 6ccx result from PRE-based distances to lipids conjugated to the paramagnetic ion gadolinium (Gd3+)

6clz, 6cm1 result from PRE-based distances to nitroxide spin-labelled PC

6pts, 6ptw, 6w4e, 6w4f result from PRE-based distances to lipids conjugated to the paramagnetic ion gadolinium (Gd3+)

Overall, none of this can provide any significant experimental information on the conformation of the lipid headgroups. 3 or 13 NOE restraints from choline hydrogens to protein in 2mzh, 2mzi are very probably not enough to assign the headgroup dihedrals correctly. Therefore, I believe that all NMR structures should be discarded.

ohsOllila commented 3 years ago

I have now remade the figure 4 using the wider bind width in the x-axis (from 10 degrees to 20 degrees as suggested in the online meeting), neglecting all NMR structures and using resolution cut-off 3.0 Å or 3.5 Å.

There are 98 PC, 83 PE, 52 PG, and 15 PS structures with 3.0 Å or better resolution and the resulting figure is here:

image

There are 129 PC, 154 PE, 74 PG, and 28 PS structures with 3.5 Å or better resolution and the resulting figure is here:

image

I think that in this case it is better to compromize in the resolution of the structures to get more data, and use the results with 3.5 Å cut-off in the latter figure. Any opinions on this?

ivan-gushchin-mipt commented 3 years ago

Is it one lipid per one PDB record?

If yes, then perhaps more data can be extracted by adding the lipids with different conformations from the same PDB record. Do you plan to check this? Should I and @pbuslaev check this?

As a minor remark, the bin at 340 degrees seems to be missing.

ohsOllila commented 3 years ago

Yes, this is only one lipid per PDB record. I was not planning to check this myself so it would be great if you could work on this.

Another issue where tracking of similarity between lipid structures is needed is the figure proposed by @mattijavanainen illustrating different lipid headgroups bound to different proteins in same conformations. If you write code which is tracking lipid structures, maybe you could look into this as well? I have now updated the pdb search code in the Git: https://github.com/NMRLipids/NMRlipidsIVPEandPG/blob/master/scripts/pdbSEARCH.ipynb I was trying to save the angles for this purpose in this line outfileVALUES=open('../Data/' + output + str(dih) + 'values.dat','w') but it did not work because sometimes not all lipid atoms are present in PDB and therefore there were different amounts of angles, and the identity could not be followed based on lines. Let me know if you will take a look at this also, should I continue this myself?

As a minor remark, the bin at 340 degrees seems to be missing.

Yes, this may be due to distribution boundaries at line dist = plt.hist(dihVALUES, bins=range(0,360,20), density=True) but I did not start to test that now because it takes couple of hours to regenerate the data for the plot.

ivan-gushchin-mipt commented 3 years ago

We discussed this with Pavel, will try to do all the analyses this week.

ohsOllila commented 3 years ago

I did some analysis of the dihedral distributions in kT units, see https://nmrlipids.blogspot.com/2020/12/nmrlipids-ivb-toward-submission-of.html?showComment=1611760032710#c1951550810091023507

I also plotted the distribution of individual dihedral energies from PDB using -log(distribution) plot from MD simulations in bulk bilayer: ENEDISTfromPDB This plot is illustrating the amount of high energy lipid conformations in PDB according to our analysis from bulk bilayers.

This is now done using the above mentioned *values.dat (now also pushed into the Data folder in this repo) that list the individual dihedral angle values without separating the PDB structure where they originate from. However, it may be sometimes interesting to connect the estimated lipid energies to specific protein structure. If you create a list (or other object) containing angle information connected to PDBid, it could be useful for this as well.

ivan-gushchin-mipt commented 3 years ago

We discussed this with Pavel, will try to do all the analyses this week.

I'm very sorry about the delay, we hope to post the data today in the late evening or tomorrow.

ivan-gushchin-mipt commented 3 years ago

The analysis for PS is basically ready. If we take all lipids into account, there are 49 lipids in 27 PDB IDs (we'll check why not 28 a little bit later). If we start discarding the lipids, which come from the same PDB ID and for which all of the dihedral angles are within "threshold" of the angles in another lipid, we go down to 38 lipids:

lipid_pdb_2_graph_1

This is the list of PDB IDs sorted by the number of different lipids:

lipid_pdb_2_graph_2

The resulting dihedral distributions for PS look roughly the same. Pavel hopes to download and calculate the data for PC, PE and PG overnight - we'll see if there will be more data or some different effects.

P.S. Trying to figure out the meaning of entity IDs took us quite some time - these should be the different protein species in the sample, but assignment of ligands/lipids to these entities is not clear. Consequently, all lipids were downloaded, but then the fully identical lipids were discarded.

P.P.S. While we discard similar lipids from the same PDB record, we keep similar lipids belonging to different PDB IDs at the moment. In some cases it is obvious that the lipid conformations are related (consecutive PDB IDs), but sometimes it is not. Checking for this automatically is complicated, as the protein may have different conformations and then keeping both lipid conformations is justified, etc. So, probably, it is easier to keep such conformations at the moment.

PDB IDs 6lcp, 6lcr highlight another issue where the lipids are modeled differently, whereas the environment (protein) is roughly the same:

image

In this case, I think, the lipids should be in a very similar conformation (or ensemble of conformations). But, again, checking manually hundreds of lipid conformations and comparing to electron densities would take too much time.

pbuslaev commented 3 years ago

Hi,

I am sorry for the delay with the code. Finally it is ready. @ohsOllila you can find the code that collects lipid dihedrals from PDB here.

I first saved information about all lipids found in data base. Here are the files for PS, PG, PC and PE.

Next, I parsed these tables to exclude all the lipids from the same pdb file with similar dihedrals (the maximal difference between dihedrals less than 3 degrees). I then saved this parsed data to new files for PS, PG, PC and PE.

Keeping both table provides additional possibility to check, which lipids have been excluded.

Another issue where tracking of similarity between lipid structures is needed is the figure proposed by @mattijavanainen illustrating different lipid headgroups bound to different proteins in same conformations.

I am saving all important information to prepare such figures. I can later right a function, that searches such pairs of proteins where lipids are in identical conformations and prepare a figure. @ohsOllila do you have a layout in mind for such figure?

ivan-gushchin-mipt commented 3 years ago

Regarding the figure with different lipid headgroups bound to different proteins in the same conformations: I think it would be helpful, in the first place, to have an estimate of how many such lipids are found in the dataset. After that, it will be easier to decide what the figure can look like.

From a cursory look, I do not see any cases where two different lipids have dihedrals, which are all within 10° of each other. There are quite a lot of such cases where the type of the lipid is the same, but in all cases it is just different PDB IDs of the same protein.

When I relax the threshold to 20°, I find two occurencies: PDB ID 4ILA, PC resid 403 chain B/C/D/E PDB ID 6HU9, PE resid 303 chain n resolution is 3.5/3.35 A. I wouldn't show this in the publication (carbonyl-phosphate interactions? arginine-choline hydrogen bond? makes no sense)

image

Another example is PDB ID 4NH2 PG resid 501 chain F PDB ID 6U9V PS resid 711 chain A/C here, the resolution is better, interactions are quite different

image

The model is reasonable in 4NH2. In 6U9V, the densities are somewhat okay, but I guess particular dihedrals are not very reliable, and there are also unaccounted densities in the vicinity, so there might be several lipids in different conformations occupying the same spot (but only one molecule is modeled in the PDB record):

image

===

If the requirements for "similarity" will be relaxed even more, I guess we can find more such cases - for example, if we want only 5 out of 6 dihedrals to be similar etc

ohsOllila commented 3 years ago

Thanks, this is highly useful.

I checked the original paper reporting 6U9V and it seems that they are not exactly sure if this lipid is actually PS. Therefore, it might not be a good idea to highlight this case in the manuscript.

I think that all dihedrals within 20 degrees is still quite strong requirement. Maybe we could do try 30 degree limit and/or leaving out the glycerol backbone dihedral (g1-g2-g3-Og3)? Do you have a script to try this easily?

ivan-gushchin-mipt commented 3 years ago

Pavel made a better script, which finds many more lipid pairs. The script is for our beloved Wolfram Mathematica, but if really needed, I think it may be rewrittein in Python.

From my observations, I wouldn't show any structure with a resolution worse than 3 Å in the manuscript.

I've looked at some of the structures and electron densities from two sets: the first one is where all six angles are within 30° of each other, and the resolution of both structures is below 3.0 Å, and the second one where five angles except g1-g2-g3-Og3 are within 30° of each other, and the resolution of both structures is below 2.5 Å. I didn't find suitable examples from the first set (but maybe you will), but some from the second set seem to be okay.

Below are the outputs: for each matching pair, there is a description and a short PyMOL script, which loads the structures and highlights the lipids:

data1six-angles30-degrees-threshold__below-3A.txt

data2five-angles30-degrees-threshold__below-2.5A.txt

Some pairs that are not bad:

1) {PG,{3ag3,11,X-ray,diffraction,1.8, 522,A, -65.476,-177.354,-59.9877,-63.8357,-144.164,-38.9595}} {PC,{4tsq,1,X-ray,diffraction,1.6, 204,B, -75.4625,-157.169,-67.3983,-51.8681,-164.431,178.732}}

fetch 3ag3, async=0 sele lip1, 3ag3 and chain A and resi 522 fetch 4tsq, async=0 sele lip2, 4tsq and chain B and resi 204 util.cbag("all") util.cbam("lip*")

2) {PC,{3b7q,1,X-ray,diffraction,2.03, 314,B, 53.8822,161.15,79.95,56.9738,-173.185,-69.7593}} {PE,{1pp9,1,X-ray,diffraction,2.1, 2006,D, 73.3315,163.367,62.3757,63.7753,176.144,66.1287}}

fetch 3b7q, async=0 sele lip1, 3b7q and chain B and resi 314 fetch 1pp9, async=0 sele lip2, 1pp9 and chain D and resi 2006 util.cbag("all") util.cbam("lip*")

Others seem to be not particularly good. Often, the phosphate is well resolved, but the densities for the rest of the headgroup are noisy.

ohsOllila commented 3 years ago

I think that the two pairs you suggest seems promising. If I remember correctly, @mattijavanainen was potentially interested to render the figure when suitable pdb structures are found. Are you still interested to try to make a figure on these two pairs?

pbuslaev commented 3 years ago

Hi,

I have created a draft figure

lipids_in_protein

ohsOllila commented 3 years ago

I think that this looks already pretty good. I have just two small suggestions:

1) It might be easier if it would be said which lipid is shown. For example, "PC in 4TSQ" etc. 2) It might be also easier if color coding of PC lipids would be same in top and bottom figures.

ohsOllila commented 3 years ago

I have now made a new figure of dihedral distributions taking into account all structures (also from same pdb if they are not exactly the same) as listed by @pbuslaev in message above. When I use the resolution cut-off below 3.2, I find 311 PC, 394 PE, 154 PG, and 35 PS structures, which is significantly more than previously. The dihedral distribution plot looks like this:
DIHEDRALSALLfromPDBco3 2

If I reduce the cut-off, the plot remains essentially similar (based on my subjective view).

However, if I increase the cut-off to 3.3, the relative fractions of trans states increase in all expect the last dihedral: DIHEDRALSALLfromPDBco3 3 Here is 520 PC, 458 PE, 167 PG, and 36 PS structures.

I think that the increase in trans states between 3.2 and 3.3 cut-offs may arise from some lower resolution structures which have lot of lipids in this conformation. Therefore I would use the cut-off 3.2 and the first plot in the manuscript. Are there other opinions on this?

The script that I used to make the plot from the data by @pbuslaev is here: https://github.com/NMRLipids/NMRlipidsIVPEandPG/blob/master/scripts/DIHdistFROMpdb.ipynb

ivan-gushchin-mipt commented 3 years ago

@ohsOllila I think using cut-off of 3.2 is fine.

A minor comment regarding the plots: the values at 360° should be the same as at 0°, and also your plots look slightly different from the histogram in the IPYNB notebook (the dots are centered at 0°, 20° etc and not at 10°, 30° etc). One possibility to correct this is to have the X axis values for the plot at the centers of the bins (i.e. 10°, 30° etc). In that case, the values centered at 10° and 350° of course shouldn't be identical. Another way is to make the histogram from -10 to 370 with bins of 20°, and then put the sum of the values in the first and the last bin to both the first and the last bin (since we want the values from 350° to 10°). I've proposed the edits to the IPYNB file (https://github.com/NMRLipids/NMRlipidsIVPEandPG/pull/43), but unfortunately I have no way to check if it works as intended at the moment.

pbuslaev commented 3 years ago

Here is the updated figure

lipids_in_protein

ivan-gushchin-mipt commented 3 years ago

The black color for PC was chosen for consistency with other plots, where PC is always black. However, I wonder if yellow or orange would help to separate the lipid better from the protein or from the other lipid in the overlay panels.

ohsOllila commented 3 years ago

The black color for PC was chosen for consistency with other plots, where PC is always black. However, I wonder if yellow or orange would help to separate the lipid better from the protein or from the other lipid in the overlay panels.

I agree that selecting nice colors is more important here than the consistency with other plots.

ohsOllila commented 3 years ago

@ohsOllila I think using cut-off of 3.2 is fine.

A minor comment regarding the plots: the values at 360° should be the same as at 0°, and also your plots look slightly different from the histogram in the IPYNB notebook (the dots are centered at 0°, 20° etc and not at 10°, 30° etc). One possibility to correct this is to have the X axis values for the plot at the centers of the bins (i.e. 10°, 30° etc). In that case, the values centered at 10° and 350° of course shouldn't be identical. Another way is to make the histogram from -10 to 370 with bins of 20°, and then put the sum of the values in the first and the last bin to both the first and the last bin (since we want the values from 350° to 10°). I've proposed the edits to the IPYNB file (#43), but unfortunately I have no way to check if it works as intended at the moment.

I made a new plot with your updates: DIHEDRALSALLfromPDB

pbuslaev commented 3 years ago

Hi,

I have updated the structural figure. If needed I can share the svg file as well.

lipids_in_protein

ohsOllila commented 3 years ago

Thanks, the figure looks really good to me now.

It would be interesting if we could understand from these structures why there is PE and PG in binding sites on the left column and PC on right. Are there some clear lipid-protein interactions which prefer binding of these specific lipids to these sites? I do not think that this is necessary, but it might a nice addition to the paper to pinpoint some interactions that drive lipid specificity more than lipid conformations.

ohsOllila commented 3 years ago

I have now included the results from this issue to the manuscript, and updated the discussion and methods accordingly. It would be good if @pbuslaev and @ivan-gushchin-mipt would check at least that the methods are correctly described. Also any other comments are welcomed (to here or to manuscript through GitHub or Overleaf).

ivan-gushchin-mipt commented 3 years ago

It would be interesting if we could understand from these structures why there is PE and PG in binding sites on the left column and PC on right. Are there some clear lipid-protein interactions which prefer binding of these specific lipids to these sites? I do not think that this is necessary, but it might a nice addition to the paper to pinpoint some interactions that drive lipid specificity more than lipid conformations.

Sorry for the slow reply!

I've looked at the structures, I do not think that we can make some observations or conclusions that are at the same time general and not obvious for any ligand-protein interactions study, such as "different charges attract" or "there should be no steric clashes". Also, I'm not very familiar with the literature - there have been several long reviews recently on protein-lipid interactions - maybe somebody has already done a more detailed analysis.

In PDB IDs 1PP9, 3B7Q and 4TSQ, there are negatively charged groups nearby, so this might be the reason. However, I do not see clear reasons why PE or PC might be more preferable - both have a positive group at the terminus.

In PDB ID 3AG3, the lipid headgroup is buried and tightly bound with two histidines making hydrogen bonds with glycerol hydroxyls.

Generally, I think, our present approach is not well suited for the analysis of why one head group is preferred to another. If we really wanted to do that, we might've analyzed a different subset of structures focusing on the moiety beyond phosphate and maybe making a more stringent resolution cutoff, and/or conducted some FEP-like or competitive binding simulations. Also, the lipid content in crystal samples may be affected by a number of issues beyond headgroup-protein interactions, such as tail-protein interactions, or rarity of a particular lipid in the particular organism, or indeliberate removal of a particular lipid during sample preparation, and it is difficult to take this into account.

ivan-gushchin-mipt commented 3 years ago

I have now included the results from this issue to the manuscript, and updated the discussion and methods accordingly. It would be good if @pbuslaev and @ivan-gushchin-mipt would check at least that the methods are correctly described. Also any other comments are welcomed (to here or to manuscript through GitHub or Overleaf).

I'll try to check this today, and also ask Pavel to check.

pbuslaev commented 3 years ago

Hi @ohsOllila ,

in the text you are referencing my code for the analysis https://github.com/pbuslaev/scr/blob/master/PDB%20analysis.ipynb. Probably that is worth adding it to the nmrlipids repository? I can make a pull request

ohsOllila commented 3 years ago

Yes, it would be very good to put the code in this repo. Please, put it to scripts folder and make a pull request.

ivan-gushchin-mipt commented 3 years ago

Also, I'm not very familiar with the literature - there have been several long reviews recently on protein-lipid interactions - maybe somebody has already done a more detailed analysis.

I've now skimmed through the reviews. We might want to cite some of them, Matti is a coauthor in the second one: https://pubs.acs.org/doi/10.1021/acs.chemrev.8b00460 https://pubs.acs.org/doi/10.1021/acs.chemrev.8b00538 https://pubs.acs.org/doi/10.1021/acs.chemrev.8b00608 As I see it, there are many different examples of specific and non-specific interactions (from experiments and simulations), but no clear generalizations of the binding rules - each site and each protein has its own peculiarities.

Regarding the specific binding, I'm thinking about what could be an alternative? Could the lipid (very flexible on itself, as is shown in the manuscript) become ordered in the binding site without specific interactions? I think that probably there will always be some interactions (or the lipid will not be ordered). And all lipids that we observe in the PDB structures and analyze in the manuscript are ordered.

ohsOllila commented 3 years ago

Because lipids are very disordered in bilayer according to our results, they lose entropy when binding in rigid conformations to proteins. I do not see other reason than lipid-protein interactions that would compensate the lost entropy upon binding. On the other hand, we do not find differences in accessible conformation between headgroups. Therefore, I think that the specificity of binding certain lipids is most likely driven by intermolecular interactions, not by differences in accessible conformations between different headgroups.

ohsOllila commented 3 years ago

I think that this issue is essentially handled. Thanks a lot to everybody! If needed, for example, during revision, we can reopen.