Closed Sulstice closed 1 year ago
Arginine Cysteine Glycine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Glutamic acid Aspartic acid Alanine Arginine Histidine Palmitic acid Stearic acid Arachidic acid Lignoceric acid Oleic Acid Linoleic acid α-Linoleic acid Myristic acid Palmitic acid Stearic acid Arachidic acid Behenic acid Lignoceric acid Palmitoleic acid 11-Hexadecenoic acid 10-Heptadecenoic acid Oleic acid 11-Octadecenoic acid 11-Eicosenoic acid 9,12-Hexadecadienoic acid Linoleic acid 9,15-Octadecadienoic acid Hepta-2,4(E,E)-dienoic acid Linolenic acid Ascorbic acid Thiamine Riboflavin Niacin Pantothenic acid Pyridoxine Folic acid Vitamin A Vitamin E Vitamin K Gallic Acid Vanillic acid Syringic acid Protocatechuic acid p-hydroxybenzoic acid p-coumaric acid Chlorogenic acid ferulic acid caffeic acid theogallin quercetin-3-O-galactoside quercetin-3-O-glucoside quercetin-3-O-xyloside Magniferin cyanidin delphinidin pelargonidin (+)-catechin apigenin luteolin kaempferol myricetin 9-cis-violaxanthin limonene Alpha-terpinolene d-carvone α-phellandrene α-humulene γ-terpinene α-pinene (−)-trans-caryophyllene sabinene (+)-3-carene cis-caryophyllene α-humulene germacrene D aromadendrene β-cubebene α-cubebene α-bourbonene β-elemene
Cool let's start here where are creating an object
in python. This is called a class object where we define the object as a Mango. The mango would have attributes like color or taste which will be functions. For the purpose of GlobalChem we are interested in the chemical composition so we have a function called get_smiles
:
class Mango(object):
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': '',
'glycine': '',
'isoleucine': '',
'leucine': '',
'lysine': '',
'methionine': '',
'phenylalanine': '',
'proline': '',
'serine': '',
'threonine': '',
'tryptophan': '',
'tyrosine': '',
'valine': '',
'glutamic acid': '',
'aspartic acid': '',
'alanine': '',
'arginine': '',
'histidine': '',
'palmitic acid': '',
'stearic acid ': '',
'arachidic acid': '',
'lignoceric acid': '',
'oleic acid': '',
'linoleic acid': '',
'alpha-Linoleic acid': '',
'myristic acid': '',
'palmitic acid': '',
'stearic acid': '',
'arachidic acid': '',
'behenic acid': '',
'lignoceric acid': '',
'palmitoleic acid': '',
'hexadecenoic acid': '',
'heptadecenoic acid': '',
'oleic acid': '',
'octadecenoic acid': '',
'eicosenoic acid': '',
'9,12-Hexadecadienoic acid': '',
'linoleic acid': '',
'9,15-Octadecadienoic acid': '',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': '',
'ascorbic acid ': '',
'thiamine': '',
'riboflavin': '',
'niacin': '',
'pantothenic acid ': '',
'pyridoxine ': '',
'folic acid ': '',
'vitamin A': '',
'vitamin E': '',
'vitamin K': '',
'gallic Acid': '',
'vanillic acid ': '',
'syringic acid ': '',
'protocatechuic acid': '',
'para hydroxybenzoic acid ': '',
'paracoumaric acid': '',
'chlorogenic acid ':'',
'ferulic acid': '',
'caffeic acid': '',
'theogallin': '',
'quercetin-3-O-galactoside ': '',
'quercetin-3-O-glucoside ': '',
'quercetin-3-O-xyloside ': '',
'magniferin': '',
'cyanidin': '',
'delphinidin': '',
'pelargonidin': '',
'catechin': '',
'apigenin': '',
'luteolin': '',
'kaempferol': '',
'myricetin': '',
'9-cis-violaxanthin': '',
'limonene': '',
'alpha-terpinolene': '',
'd-carvone': '',
'alpha phellandrene': '',
'alpha humulene': '',
'gamma terpinene': '',
'alpha pinene': '',
'trans caryophyllene': '',
'sabinene': '',
'carene': '',
'cis-caryophyllene': '',
'αlpha humulene': '',
'germacrene d': '',
'aromadendrene': '',
'beta ubebene': '',
'alpha cubebene': '',
'alpha bourbonene': '',
'beta elemene': '',
}
return smiles
The next task for you to do @Nickspizza001 is to fill in the SMILES. To make it easier, you can search through to find if the name exists already. If you do find it, then add the SMILES here as an entry.
There might be bugs where there are multiple entries mapped to the same name and we should pick one and change the rest .
Note some rules I have applied when curating your list:
Do this for all the entries. You have multiple entries so I think we can maybe sub classes of the Mango.
class Mango(object):
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': 'C([C@@H](C(=O)O)N)S',
'glycine': 'C(C(=O)O)N',
'isoleucine': 'CC[C@H](C)[C@@H](C(=O)O)N ',
'leucine': 'CC(C)C[C@@H](C(=O)O)N',
'lysine': 'C(CCN)C[C@@H](C(=O)O)N',
'methionine': 'CSCC[C@@H](C(=O)O)N',
'phenylalanine': 'C1=CC=C(C=C1)C[C@@H](C(=O)O)N',
'proline': 'C1C[C@H](NC1)C(=O)O',
'serine': 'C([C@@H](C(=O)O)N)O',
'threonine': 'C[C@H]([C@@H](C(=O)O)N)O',
'tryptophan': 'C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)N ',
'tyrosine': 'C1=CC(=CC=C1C[C@@H](C(=O)O)N)O ',
'valine': 'CC(C)[C@@H](C(=O)O)N',
'glutamic acid': 'C(CC(=O)O)[C@@H](C(=O)O)N',
'aspartic acid': 'C([C@@H](C(=O)O)N)C(=O)O',
'alanine': 'C[C@@H](C(=O)O)N',
'arginine': 'C(C[C@@H](C(=O)O)N)CN=C(N)N',
'histidine': 'C1=C(NC=N1)C[C@@H](C(=O)O)N',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid ': 'CCCCCCCCCCCCCCCCCC(=O)O',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'alpha-Linoleic acid': '',
'myristic acid': 'CCCCCCCCCCCCCC(=O)O',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid': 'CCCCCCCCCCCCCCCCCC(=O)O ',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'behenic acid': 'CCCCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'palmitoleic acid': 'CCCCCC/C=C\CCCCCCCC(=O)O',
'hexadecenoic acid': 'CCCC/C=C/CCCCCCCCCC(=O)O',
'heptadecenoic acid': 'CCCCCC/C=C/CCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'octadecenoic acid': 'CCCCCC/C=C/CCCCCCCCCC(=O)O',
'eicosenoic acid': 'CCCCCCCC/C=C\CCCCCCCCCC(=O)O',
'9,12-Hexadecadienoic acid': 'CCC/C=C/C/C=C/CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'9,15-Octadecadienoic acid': 'CC/C=C/CCCC/C=C/CCCCCCCC(=O)O',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': 'CC/C=C\C/C=C\C/C=C\CCCCCCCC(=O)O',
'ascorbic acid ': 'C([C@@H]([C@@H]1C(=C(C(=O)O1)O)O)O)O',
'thiamine': 'CC1=C(SC=[N+]1CC2=CN=C(N=C2N)C)CCO',
'riboflavin': 'CC1=CC2=C(C=C1C)N(C3=NC(=O)NC(=O)C3=N2)C[C@@H]([C@@H]([C@@H](CO)O)O)O',
'niacin': 'C1=CC(=CN=C1)C(=O)O',
'pantothenic acid': 'CC(C)(CO)[C@H](C(=O)NCCC(=O)O)O',
'pyridoxine ': 'CC1=NC=C(C(=C1O)CO)CO',
'folic acid ': 'C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=O)NC(=N3)N',
'vitamin A': 'CC1=C(C(CCC1)(C)C)/C=C/C(=C/C=C/C(=C/CO)/C)/C',
'vitamin E': 'CC1=C(C2=C(CC[C@@](O2)(C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)C(=C1O)C)C',
'vitamin K': 'CC1=C(C(=O)C2=CC=CC=C2C1=O)C/C=C(\C)/CCCC(C)CCCC(C)CCCC(C)C',
'gallic Acid': 'C1=C(C=C(C(=C1O)O)O)C(=O)O',
'vanillic acid ': 'COC1=C(C=CC(=C1)C(=O)O)O',
'syringic acid ': 'COC1=CC(=CC(=C1O)OC)C(=O)O',
'protocatechuic acid': 'C1=CC(=C(C=C1C(=O)O)O)O',
'para hydroxybenzoic acid ': 'C1=CC(=CC=C1C(=O)O)O',
'paracoumaric acid': 'C1=CC(=CC=C1/C=C/C(=O)O)O',
'chlorogenic acid ':'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)/C=C/C2=CC(=C(C=C2)O)O)O)O',
'ferulic acid': 'COC1=C(C=CC(=C1)/C=C/C(=O)O)O',
'caffeic acid': 'C1=CC(=C(C=C1/C=C/C(=O)O)O)O',
'theogallin': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)C2=CC(=C(C(=C2)O)O)O)O)O',
'quercetin-3-O-galactoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-glucoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-xyloside': 'C1C(C(C(C(O1)OC2=C(OC3=CC(=CC(=C3C2=O)O)O)C4=CC(=C(C=C4)O)O)O)O)O',
'magniferin': 'C1=C2C(=CC(=C1O)O)OC3=C(C2=O)C(=C(C(=C3)O)[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O',
'cyanidin': 'C1=CC(=C(C=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O)O',
'delphinidin': 'C1=C(C=C(C(=C1O)O)O)C2=[O+]C3=CC(=CC(=C3C=C2O)O)O.[Cl-]',
'pelargonidin': 'C1=CC(=CC=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O',
'catechin': 'C1[C@@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O',
'apigenin': 'C1=CC(=CC=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O ',
'luteolin': 'C1=CC(=C(C=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'kaempferol': 'C1=CC(=CC=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'myricetin': 'C1=C(C=C(C(=C1O)O)O)C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O',
'9-cis-violaxanthin': 'C/C(=C\C=C\C=C(/C)\C=C\C=C(\C)/C=C/[C@]12[C@](O1)(C[C@H](CC2(C)C)O)C)/C=C/C=C(\C)/C=C/[C@]34[C@](O3)(C[C@H](CC4(C)C)O)C',
'limonene': 'CC1=CCC(CC1)C(=C)C',
'alpha-terpinolene': r"C/C(C)=C1CCC(C)C=C\1",
'd-carvone': 'CC1=CC[C@@H](CC1=O)C(=C)C',
'alpha phellandrene': '',
'alpha humulene': '',
'gamma terpinene': '',
'alpha pinene': '',
'trans caryophyllene': '',
'sabinene': '',
'carene': '',
'cis-caryophyllene': '',
'αlpha humulene': '',
'germacrene d': '',
'aromadendrene': '',
'beta ubebene': '',
'alpha cubebene': '',
'alpha bourbonene': '',
'beta elemene': '',
}
return smiles
class MangoAminoAcids(object):
def __init__(self):
self.name = 'mango_Amino_acids'
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': 'C([C@@H](C(=O)O)N)S',
'glycine': 'C(C(=O)O)N',
'isoleucine': 'CC[C@H](C)[C@@H](C(=O)O)N ',
'leucine': 'CC(C)C[C@@H](C(=O)O)N',
'lysine': 'C(CCN)C[C@@H](C(=O)O)N',
'methionine': 'CSCC[C@@H](C(=O)O)N',
'phenylalanine': 'C1=CC=C(C=C1)C[C@@H](C(=O)O)N',
'proline': 'C1C[C@H](NC1)C(=O)O',
'serine': 'C([C@@H](C(=O)O)N)O',
'threonine': 'C[C@H]([C@@H](C(=O)O)N)O',
'tryptophan': 'C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)N ',
'tyrosine': 'C1=CC(=CC=C1C[C@@H](C(=O)O)N)O ',
'valine': 'CC(C)[C@@H](C(=O)O)N',
'glutamic acid': 'C(CC(=O)O)[C@@H](C(=O)O)N',
'aspartic acid': 'C([C@@H](C(=O)O)N)C(=O)O',
'alanine': 'C[C@@H](C(=O)O)N',
'arginine': 'C(C[C@@H](C(=O)O)N)CN=C(N)N',
'histidine': 'C1=C(NC=N1)C[C@@H](C(=O)O)N'
}
return smiles
class MangoFattyAcids(object):
def __init__(self):
self.name = 'mango_fatty_acids'
@staticmethod
def get_smiles():
smiles = {
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid ': 'CCCCCCCCCCCCCCCCCC(=O)O',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'alpha-Linoleic acid': '',
'myristic acid': 'CCCCCCCCCCCCCC(=O)O',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid': 'CCCCCCCCCCCCCCCCCC(=O)O ',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'behenic acid': 'CCCCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'palmitoleic acid': 'CCCCCC/C=C\CCCCCCCC(=O)O',
'hexadecenoic acid': 'CCCC/C=C/CCCCCCCCCC(=O)O',
'heptadecenoic acid': 'CCCCCC/C=C/CCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'octadecenoic acid': 'CCCCCC/C=C/CCCCCCCCCC(=O)O',
'eicosenoic acid': 'CCCCCCCC/C=C\CCCCCCCCCC(=O)O',
'9,12-Hexadecadienoic acid': 'CCC/C=C/C/C=C/CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'9,15-Octadecadienoic acid': 'CC/C=C/CCCC/C=C/CCCCCCCC(=O)O',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': 'CC/C=C\C/C=C\C/C=C\CCCCCCCC(=O)O'
}
return smiles
class MangoFlavonoids(object):
def __init__(self):
self.name = 'mango_flavonoids'
@staticmethod
def get_smiles():
smiles = {
'quercetin-3-O-galactoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-glucoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-xyloside': 'C1C(C(C(C(O1)OC2=C(OC3=CC(=CC(=C3C2=O)O)O)C4=CC(=C(C=C4)O)O)O)O)O',
'magniferin': 'C1=C2C(=CC(=C1O)O)OC3=C(C2=O)C(=C(C(=C3)O)[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O',
'cyanidin': 'C1=CC(=C(C=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O)O',
'delphinidin': 'C1=C(C=C(C(=C1O)O)O)C2=[O+]C3=CC(=CC(=C3C=C2O)O)O.[Cl-]',
'pelargonidin': 'C1=CC(=CC=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O',
'catechin': 'C1[C@@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O',
'apigenin': 'C1=CC(=CC=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O ',
'luteolin': 'C1=CC(=C(C=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'kaempferol': 'C1=CC(=CC=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O',
}
return smiles
class MangoPhenolicAcids(object):
def __init__(self):
self.name = 'mango_phenolic_acids'
@staticmethod
def get_smiles():
smiles = {
'gallic Acid': 'C1=C(C=C(C(=C1O)O)O)C(=O)O',
'vanillic acid ': 'COC1=C(C=CC(=C1)C(=O)O)O',
'syringic acid ': 'COC1=CC(=CC(=C1O)OC)C(=O)O',
'protocatechuic acid': 'C1=CC(=C(C=C1C(=O)O)O)O',
'para hydroxybenzoic acid ': 'C1=CC(=CC=C1C(=O)O)O',
'paracoumaric acid': 'C1=CC(=CC=C1/C=C/C(=O)O)O',
'chlorogenic acid ': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)/C=C/C2=CC(=C(C=C2)O)O)O)O',
'ferulic acid': 'COC1=C(C=CC(=C1)/C=C/C(=O)O)O',
'caffeic acid': 'C1=CC(=C(C=C1/C=C/C(=O)O)O)O',
'theogallin': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)C2=CC(=C(C(=C2)O)O)O)O)O',
}
return smiles
class MangoVitamins(object):
def __init__(self):
self.name = 'mango_vitamins'
@staticmethod
def get_smiles():
smiles = {
'ascorbic acid ': 'C([C@@H]([C@@H]1C(=C(C(=O)O1)O)O)O)O',
'thiamine': 'CC1=C(SC=[N+]1CC2=CN=C(N=C2N)C)CCO',
'riboflavin': 'CC1=CC2=C(C=C1C)N(C3=NC(=O)NC(=O)C3=N2)C[C@@H]([C@@H]([C@@H](CO)O)O)O',
'niacin': 'C1=CC(=CN=C1)C(=O)O',
'pantothenic acid': 'CC(C)(CO)[C@H](C(=O)NCCC(=O)O)O',
'pyridoxine ': 'CC1=NC=C(C(=C1O)CO)CO',
'folic acid ': 'C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=O)NC(=N3)N',
'vitamin A': 'CC1=C(C(CCC1)(C)C)/C=C/C(=C/C=C/C(=C/CO)/C)/C',
'vitamin E': 'CC1=C(C2=C(CC[C@@](O2)(C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)C(=C1O)C)C',
'vitamin K': 'CC1=C(C(=O)C2=CC=CC=C2C1=O)C/C=C(\C)/CCCC(C)CCCC(C)CCCC(C)C',
}
return smiles
@Nickspizza001 Awesome! that is where I was headed in terms distribution of the chemicals and categorizing them. So next we are going to do a distribution.
Now we need to add your nodes to the knowledge graph:
You will see I created this file. Now we to need to determine where we are going to add your node into the knowledge graph. Do you think we should have a directory for fruit? and then another directory for mango.
food/mango/mango_amino_acids
Create a python file for each class object and for the name of the file: all lowercase and split words with a _
keyword and then add your path to the node here in this file:
You can copy some of my lines that I did there and also make the changes in here:
https://github.com/Global-Chem/global-chem/blob/development/global_chem/global_chem/global_chem.py https://github.com/Global-Chem/global-chem/blob/development/global_chem/global_chem/__init__.py
What you are doing is adding a Node into the network. There is an algorithm that iterates through the directory structure and then builds the nodes in relation to all the other nodes. We then want to add your objects to the list:
Please read this article after you add and modify the files. We will be doing this together where we release a new version of the software with your Mango component and I did it for cannabis too with the subdirectories so it will go in the same release.
I think this issue now resolved in the New Release!
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6807195/#:~:text=The%20major%20amino%20acids%20include,a%20and%20b)%20and%20carotenoids.
Would be worth it to add food directly as SMILES for the chemical composition.