Compatibility with Martini Coarse Grain models

Hi Brady!

Here's the feature request for the compatibility with Martini Coarse Grain models (for tracking). I'm use to those files so I can work on that.

[X] Backbone detection (for "tube" representation only)
For proteins, their backbone is called BB and side chain atoms called SC (like SC1, SC2, SC3, SC4 )
[ ] Residue detections
- Detection of the "residue name" present, which can be very usefull for membrane with different lipid composition
[ ] Membrane Atom type (the mad database can be helpful to see a lipid coarse grained : https://mad.ibcp.fr/explore)
- hydrophobic chain are usually called C1A,C2A,C1B,C2B.. or D2A for the insaturated part.
- Phosphate group PO4
- Glyrecol group GL1/GL2
- Atoms types above the phosphat group can have specific name (example NC3 for the choline group of the POPC, or C1,C2,C3 for the inositol group of the POPI)
[ ] Bonds
- Martini models have specific bonding definitions. It would nice
- NOTE : For bonds, the best way, for now, is to load the system with mdanalysis and use a TPR input.

I drop this here, those are atoms connectivity for amino acids

martini_aa_connectivity = {
    'ALA': [('BB', 'SC1')],
    'ARG': [('BB', 'SC1'), ('SC1', 'SC2')],
    'ASN': [('BB', 'SC1'), ('SC1', 'SC2')],
    'ASP': [('BB', 'SC1'), ('SC1', 'SC2')],
    'CYS': [('BB', 'SC1'), ('SC1', 'SC2')],
    'GLN': [('BB', 'SC1'), ('SC1', 'SC2'), ('SC2', 'SC3')],
    'GLU': [('BB', 'SC1'), ('SC1', 'SC2'), ('SC2', 'SC3')],
    'GLY': [('BB', 'SC1')],
    'HIS': [('BB', 'SC1'), ('SC1', 'SC2')],
    'ILE': [('BB', 'SC1'), ('SC1', 'SC2')],
    'LEU': [('BB', 'SC1'), ('SC1', 'SC2')],
    'LYS': [('BB', 'SC1'), ('SC1', 'SC2'), ('SC2', 'SC3')],
    'MET': [('BB', 'SC1'), ('SC1', 'SC2'), ('SC2', 'SC3')],
    'PHE': [('BB', 'SC1'), ('SC1', 'SC2')],
    'PRO': [('BB', 'SC1'), ('SC1', 'SC2')],
    'SER': [('BB', 'SC1'), ('SC1', 'SC2')],
    'THR': [('BB', 'SC1'), ('SC1', 'SC2')],
    'TRP': [('BB', 'SC1'), ('SC1', 'SC2')],
    'TYR': [('BB', 'SC1'), ('SC1', 'SC2'), ('SC2', 'SC3')],
    'VAL': [('BB', 'SC1'), ('SC1', 'SC2')],
}

Looks like it should be simple enough to implement. Am I right that MDAnalysis can read these files, and so it will just be a matter of getting the right MDAnlysis code that gets the attributes we are after?

Backbone Detection

This just needs to be a 1D boolean numpy array, for each atom. See att_is_alpha_carbon() https://github.com/BradyAJohnston/MolecularNodes/blob/4e8fc24c2f279f14937466b76277744b845b1abf/MolecularNodes/md.py#L200-L201

Residue Detection

You can store the unique names for the different residues as a custom python property on the object after creation. The result of create_object() returns a blender object. You can then set arbitrary values like on this line, which can then be accessed when creating new custom nodes inside of Geometry Nodes: https://github.com/BradyAJohnston/MolecularNodes/blob/4e8fc24c2f279f14937466b76277744b845b1abf/MolecularNodes/md.py#L190

Bonds & Atom Types

These can both be stored easily enough, just require a custom dictionary encoding them into integer values, which can then be used for lookup when creating nodes & when storing values as integers on the geometry itself.

ok thank you for all your tips =D

Indeed, we can use MDAnalysis to setup the attribute!

I have another question... Are those data take a lot of memory ? If you don't think it's useless, we could add a "is_lipids" attribute ! However, the mdanalysis selection string is.... Long XD I have compile all lipids name from the CHARMM force field and Martini force field, removed duplicates (resulting in 564 lipid names)

Here's a selection string in Martini for example.

lipids = u.select_atoms("resname 23SM ABLIPA ABLIPB ADR ADRP ALIN ALINP APC APPC ARA ARAN ARANP ARAP ASM BCLIPA BCLIPB BCLIPC BEH BEHP BNSM BSM C6DHPC C7DHPC CER160 CER180 CER181 CER2 CER200 CER220 CER240 CER241 CER3E CHAPS CHAPSO CHL1 CHM1 CHNS CHOA CHSD CHSP CJLIPA CPC CTLIPA CYFOS3 CYFOS4 CYFOS5 CYFOS6 CYFOS7 CYSF CYSG CYSL CYSP DAPA DAPA DAPC DAPC DAPE DAPE DAPG DAPG DAPS DAPS DBPA DBPC DBPE DBPG DBPS DBSM DCPC DDA DDAO DDAOP DDAP DDMG DDOPC DDOPE DDOPS DDPC DEPA DEPC DEPE DEPG DEPS DFPA DFPC DFPE DFPG DFPS DGLA DGLAP DGPA DGPA DGPC DGPC DGPE DGPE DGPG DGPG DGPS DGPS DHA DHAP DHPC DHPCE DIPA DIPA DIPC DIPE DIPG DIPS DLIPC DLIPE DLIPI DLPA DLPA DLPC DLPC DLPE DLPE DLPG DLPG DLPS DLPS DMPA DMPC DMPCE DMPE DMPEE DMPG DMPI DMPI13 DMPI14 DMPI15 DMPI24 DMPI25 DMPI2A DMPI2B DMPI2C DMPI2D DMPI33 DMPI34 DMPI35 DMPS DNPA DNPA DNPC DNPC DNPE DNPE DNPG DNPG DNPS DNPS DOMG DOPA DOPA DOPC DOPC DOPCE DOPE DOPE DOPEE DOPG DOPG DOPP1 DOPP2 DOPP3 DOPS DOPS DPA DPAP DPC DPCE DPP1 DPP2 DPPA DPPA DPPC DPPC DPPE DPPE DPPEE DPPG DPPG DPPGK DPPI DPPS DPPS DPSM DPT DPTP DRPA DRPC DRPE DRPG DRPS DSPA DSPC DSPE DSPG DSPS DTPA DTPA DTPC DTPE DTPG DTPS DUPC DUPE DUPS DVPA DVPC DVPE DVPG DVPS DXCE DXPA DXPA DXPC DXPC DXPE DXPE DXPG DXPG DXPS DXPS DXSM DYPA DYPA DYPC DYPC DYPE DYPE DYPG DYPG DYPS DYPS ECLIPA ECLIPB ECLIPC EDA EDAP EICO EICOP EPA EPAP ERG ERU ERUP ETA ETAP ETE ETEP FOIS11 FOIS9 FOS10 FOS12 FOS13 FOS14 FOS15 FOS16 GLA GLAP GLYM HPA HPAP HPLIPA HPLIPB HTA HTAP IPC IPPC KPLIPA KPLIPB KPLIPC LAPAO LAPAOP LAU LAUP LDAO LDAOP LIGN LIGNP LILIPA LIN LINP LLPA LLPC LLPE LLPS LMPG LNACL1 LNACL2 LNBCL1 LNBCL2 LNCCL1 LNCCL2 LNDCL1 LNDCL2 LOACL1 LOACL2 LOCCL1 LOCCL2 LPC LPC12 LPC14 LPPA LPPC LPPC LPPE LPPG LPPG LPPS LSM LYSM MCLIPA MEA MEAP MYR MYRO MYROP MYRP NER NERP NGLIPA NGLIPB NGLIPC NSM OLE OLEP OPC OSM OSPE OYPE PADG PAL PALIPA PALIPB PALIPC PALIPD PALIPE PALO PALOP PALP PAPA PAPC PAPE PAPG PAPI PAPS PDOPC PDOPE PEPC PGPA PGPC PGPE PGPG PGPS PGSM PIDG PIM1 PIM2 PIPA PIPC PIPE PIPG PIPI PIPS PLPA PLPC PLPE PLPG PLPI PLPI13 PLPI14 PLPI15 PLPI24 PLPI25 PLPI2A PLPI2B PLPI2C PLPI2D PLPI33 PLPI34 PLPI35 PLPS PMCL1 PMCL2 PMPE PMPG PNCE PNPI PNPI13 PNPI14 PNPI15 PNPI24 PNPI25 PNPI2A PNPI2B PNPI2C PNPI2D PNPI33 PNPI34 PNPI35 PNSM PODG POP1 POP2 POP3 POPA POPA POPC POPC POPCE POPE POPE POPEE POPG POPG POPI POPI POPI13 POPI14 POPI15 POPI24 POPI25 POPI2A POPI2B POPI2C POPI2D POPI33 POPI34 POPI35 POPP1 POPP2 POPP3 POPS POPS POSM PPC PPPE PQPE PQPS PRPA PRPC PRPE PRPG PRPS PSM PSPG PUDG PUPA PUPC PUPE PUPI PUPS PVCL2 PVDG PVP1 PVP2 PVP3 PVPE PVPG PVPI PVSM PYPE PYPG PYPI PhPC QMPE SAPA SAPC SAPE SAPG SAPI SAPI13 SAPI14 SAPI15 SAPI24 SAPI25 SAPI2A SAPI2B SAPI2C SAPI2D SAPI33 SAPI34 SAPI35 SAPS SB3-10 SB3-12 SB3-14 SDA SDAP SDPA SDPC SDPE SDPG SDPS SDS SELIPA SELIPB SELIPC SFLIPA SITO SLPA SLPC SLPE SLPG SLPS SOPA SOPC SOPE SOPG SOPS SSM STE STEP STIG THA THAP THCHL THDPPC TIPA TLCL1 TLCL2 TMCL1 TMCL2 TOCL1 TOCL2 TPA TPAP TPC TPC TPT TPTP TRI TRIP TRIPAO TRPAOP TSPC TTA TTAP TXCL1 TXCL2 TYCL1 TYCL2 UDAO UDAOP UFOS10 UPC VCLIPA VCLIPB VCLIPC VCLIPD VCLIPE VPC XNCE XNSM YOPA YOPC YOPE YOPS YPLIPA YPLIPB")

BradyAJohnston / MolecularNodes