cheminfo / wikipedia

Wikipedia chemical structure explorer
https://wikipedia.cheminfo.org
Other
55 stars 15 forks source link

Using MongoDB to store and retrieve compounds? #34

Closed def-fun closed 5 years ago

def-fun commented 5 years ago

Hi, I'm trying to build a server to store and retrieve compounds in my lab with MongoDB and python.

MongoDB can store a lot of data and behave well if used correctly .

Now I can search compounds by name, CAS number, boiling point, SMILES string, etc. But it seems impossible to search for substructures in MongoDB.

Maybe irrelevant to this repository, do you have any suggestion for retrieving substructures from MongoDB or other database?

Thanks for your help :-)

peter-ertl commented 5 years ago

Hello Goojayfan, to perform substructure or similarity searches one needs a specialised cheminformatics software. In Wikipedia Chemical Structure Explorer the openchemlib written in Java is used, many people are using also open source RDKit written in Python. To be incorporated directly into the database engine, such software must be integrated in form of so called "chemistry cartridge". Such open source cartridge has been developed for Postgres database using RDKit https://www.rdkit.org/docs/Cartridge.html. I am not aware of any chemistry cartridge available for MongoDB, althoug some effort is ongoing to implement similarity search http://blog.matt-swain.com/post/87093745652/chemical-similarity-search-in-mongodb. To use the chemistry cartridge mkes sense for big corporate databaseses with millions of molecules. In your case, where I suppose you are dealing with thousands of structures, probably the bast approach would be to simply get SMILES out of MongoDB and run the substructure search outside the database.

def-fun commented 5 years ago

Hello Goojayfan, to perform substructure or similarity searches one needs a specialised cheminformatics software. In Wikipedia Chemical Structure Explorer the openchemlib written in Java is used, many people are using also open source RDKit written in Python. To be incorporated directly into the database engine, such software must be integrated in form of so called "chemistry cartridge". Such open source cartridge has been developed for Postgres database using RDKit https://www.rdkit.org/docs/Cartridge.html. I am not aware of any chemistry cartridge available for MongoDB, althoug some effort is ongoing to implement similarity search http://blog.matt-swain.com/post/87093745652/chemical-similarity-search-in-mongodb. To use the chemistry cartridge mkes sense for big corporate databaseses with millions of molecules. In your case, where I suppose you are dealing with thousands of structures, probably the bast approach would be to simply get SMILES out of MongoDB and run the substructure search outside the database.

Thanks, your reply is a great help to me. :-D