NMRLipids / Databank

NMRlipids databank
GNU General Public License v3.0
3 stars 30 forks source link

Create mapping file for Gb3 #169

Open MelbourneFL opened 7 months ago

MelbourneFL commented 7 months ago

Hello,

my name is Alexander Vogel and I'm trying to add a new simulation with 3 runs (7 microseconds each) to the databank (DOIs: 10.5281/zenodo.10635871 - 10.5281/zenodo.10635875 - 10.5281/zenodo.8335207). A while back there was a request for simulations with unusual lipids and this one contains the glycolipid Gb3.

It is my first time doing this but with your help files it went smooth so far. Now however I need to add the composition information to the yaml file. First I would like to confirm that the first part is correct which is POPC and water. Setup was done with CHARMM-GUI with the CHARMM FF. Is this correct then:

POPC: NAME: POPC MAPPING: mappingPOPCcharmm.yaml SOL: NAME: TIP3 MAPPING: mappingTIP3PCHARMMgui.yaml

In addition the simulations contain Gb3 which is constructed from separate parts for the Ceramide backbone and the three rings. They are called CER2, BGLC, BGAL and AGAL and I guess new mapping files have to be created for those. Can you help me with this?

Thanks and best regards,

Alexander

ohsOllila commented 7 months ago

Thanks for contributing!

It is my first time doing this but with your help files it went smooth so far. Now however I need to add the composition information to the yaml file. First I would like to confirm that the first part is correct which is POPC and water. Setup was done with CHARMM-GUI with the CHARMM FF. Is this correct then:

POPC: NAME: POPC MAPPING: mappingPOPCcharmm.yaml SOL: NAME: TIP3 MAPPING: mappingTIP3PCHARMMgui.yaml

This looks correct, but note that in Python indents matter, so it should be COMPOSITION: POPC: NAME: POPC MAPPING: mappingPOPCcharmm.yaml SOL: NAME: TIP3 MAPPING: mappingTIP3PCHARMMgui.yaml

In addition the simulations contain Gb3 which is constructed from separate parts for the Ceramide backbone and the three rings. >They are called CER2, BGLC, BGAL and AGAL and I guess new mapping files have to be created for those. Can you help me with >this?

There is already a mapping file for GM1 available, may be this can be used as a template: https://github.com/NMRLipids/Databank/blob/main/Scripts/BuildDatabank/mapping_files/mappingGM1charmm.yaml

Also, @markussmiettinen may have thought about this also?

MelbourneFL commented 7 months ago

I'm confused. Your indents look just like mine (there are none). I guess the indents are removed here. I created them just as in the example yaml file.

About the mappping. I looked at the GM1 file and partially understand the structure but I wouldn't know how to create that. E.g. looking at the first segment:

M_G1_M: ATOMNAME: C3S FRAGMENT: backbone RESIDUE: CER160

I guess ATOMNAME, FRAGMENT and RESIDUE come from the structure and naming in the simulation...so I could figure that out but how would I come up with M_G1_M?

Alexander

ohsOllila commented 7 months ago

Sorry, the indents were not properly shown in this editor. Check the correct indents for example from here: https://github.com/NMRLipids/Databank/blob/main/Scripts/BuildDatabank/info_files/777/info.yaml

I guess ATOMNAME, FRAGMENT and RESIDUE come from the structure and naming in the simulation...so I could figure that out

Yes.

but how would I come up with M_G1_M?

These are universal atom names. There is some more explanation in here: https://nmrlipids.github.io/moleculesAndMapping.html#universal-atom-names-in-mapping-files. However, this may not be explicitly clear for sugars. You can check how this is implemented by comparing the GM1 mapping file to the structure in here: https://zenodo.org/doi/10.5281/zenodo.8331804. In principle, it does not matter how these are named as long as each atom has unique name, but more logical they are, the easier it is for human to understand them. Or maybe @markussmiettinen has some insight on this?

markussmiettinen commented 7 months ago

Hi Alexander, thank you for contributing! I am just about to go offline for a week, but maybe @comcon1 would like to comment on the naming of glycolipids in the meanwhile?

MelbourneFL commented 7 months ago

Hello,

I went ahead and created a mapping file based on GM1. I tried to come up with a numbering scheme for the sugars that makes sense. Unfortunately, I can't upload it here (https://github.com/NMRLipids/Databank/tree/main/Scripts/BuildDatabank/mapping_files) because it says uploads are disabled. Also I can't attach it to this post since the file type is not supported. So I uploaded it there (where it will be available for 17 days): https://daten-transport.de/?id=J8FEt59PLbBm

Since I'll be going on vacation tomorrow (I didn't expect this to take this long), could somebody please add it to the databank and also add my simulations. The info.yaml files can be downloaded there (where it will be available for 17 days): https://daten-transport.de/?id=35JFSuY3TYKM

Thanks a lot!

Alexander

PS: When using the GM1 mapping as a template I found two atoms which in my opinion don't belong there:

This O1 seems to be in the file twice: M_G4C1O1_M:
ATOMNAME: O1 FRAGMENT: headgroup RESIDUE: BGLC

This HO4 is not present in the BGAL for Gb3: M_G5O4H4_M: ATOMNAME: HO4 FRAGMENT: headgroup RESIDUE: BGAL

ohsOllila commented 6 months ago

Thanks for the files.

You cannot directly upload to the main databank branch. You need to first make own fork, upload there, and then make a pull request to the main branch. Anyway, I added the mapping file now in here https://github.com/NMRLipids/Databank/blob/main/Scripts/BuildDatabank/mapping_files/mappingGB3charmm.yaml. I also added one of the info files in here https://github.com/NMRLipids/Databank/blob/main/Scripts/BuildDatabank/info_files/790/info.yaml. However, when I tried to add the simulation in the databank (python AddData.py -f info_files/790/info.yaml), the number of atoms did not match: "Number of atoms in trajectory 74381 and README.yaml 74371 do no match." The difference is ten atoms which equals the number of GB3 molecules in the system. This suggests that there may be one GB3 atom missing from the mapping file? It would be also good to check and fix the mentioned issues in GM1 mapping file.

MelbourneFL commented 6 months ago

Thanks for your help and feedback. Actually all atoms are accounted for but I accidentially used one M_XXX_M label twice. I fixed it and created a new fork and pull request. Could you please check again?

Alexander

ohsOllila commented 6 months ago

Thanks. I added one of the systems: https://github.com/NMRLipids/Databank/blob/main/Scripts/BuildDatabank/info_files/790/info.yaml, and everything seems to work now (except PCA that does not work for sugars). The results are here: https://github.com/NMRLipids/Databank/tree/main/Data/Simulations/82e/424/82e42412f1383f19441f90baf20721ab773ce1fa/6fd7d3a864973fb14dee318fc08f64f33a33e661

I had to update in mapping file CER24 -> CER240 because these are the real residue names even though last character is left out in gro files.

I would have added also other systems, but I could not access the info files in the link anymore. Could you commit these in the git, or send a new link?

ohsOllila commented 4 months ago

@MelbourneFL I think that there would be still couple of already made info files that could be added, but are not yet added, because I was to slow to download them. Would it be possible to send them again?