cod-developers / cod-tools

Tools for handling CIF files, and CIF parsers used for the Crystallography Open Database (COD)
GNU Lesser General Public License v3.0
18 stars 8 forks source link

Querying by formula #3

Closed shyuep closed 7 years ago

shyuep commented 7 years ago

I am trying to query the COD by formula. While I can execute the example in the wiki, I cannot run the following query.

select file from data where formula="Li"

Basically replacing the query to this does not work, though it seems it should based on the schema specified. Is there some trick to doing this?

Also, I would highly recommend developing a REST API for the COD so that queries can be easily done via http only. Right now, one has to know the COD-ID to be able to download a cif file via http, but no other functionality is available.

shyuep commented 7 years ago

I managed to figure out how to query by formula. It seems that the format of the formula is "- Li2 O -", which is somewhat strange. However, I still recommend that some form of REST API be implemented. E.g., http://www.crystallography.net/cod/cod_ids?formula="Li2 O" to get all the cod ids. Otherwise, querying for cod ids require the installation of mysql, which is rather unnecessary if all someone wanted to do is to get the cod ids.

I have implemented a basic interface to COD in pymatgen at https://github.com/materialsproject/pymatgen/blob/master/pymatgen/ext/cod.py . This allows the querying of COD to obtain Structure objects in pymatgen.

vaitkus commented 7 years ago

Dear @shyuep,

thank You for your interest in the COD. You were correct to notice that all formulae in the COD MySQL database adhere to the "- formula in Hill notation -" format. The format is currently supported for historic reasons, but we plan to replace it with a simpler one once we decide on a new database schema.

The COD actually offers a simple REST API, even though, it is not well documented since it is still under development and might change before the official release. Currently it provides the same functionality as the web search form (http://www.crystallography.net/cod/search.html); actually, it even uses the same parameter names as the POST form on the page. We will notify you in this issue discussion once the REST API has been finalized and proper documentation has been written.

For now, I will provide You with a simple example of getting all COD IDs of structures that contain the Li and O atoms and were published in 2017.

http://www.crystallography.net/cod/result.php?el1=Li&el2=O&year=2017&format=lst

where: el1, el2, ..., el8 -- symbols of chemical elements that must be present in the chemical formula of the structure; year -- year of the publication; format -- format in which the results that match the search criteria must be returned. Currently supported options are: html -- a html web page displaying the list of COD entries; csv -- a csv file containing information about the COD entries. The provided information matches the one provided by the COD MySQL database; json -- a json file containing information about the COD entries. The provided information matches the one provided by the COD MySQL database; lst -- a txt file containing a list of COD IDs; urls -- a txt file containing a list of URLs pointing to the COD CIF files; zip -- a zip file with CIF files. Please note that there is a limit of how many entries can be returned using this format; in case this limit is reached, a html page with an error message is returned instead; count -- the number of COD entries that match the search criteria.

The current API does not provide a convenient way to search directly by the chemical formula, but it indeed seems like a parameter worth implementing. Could You please provide several more examples on how You would like the API to behave?

shyuep commented 7 years ago

Great. Thanks. I would suggest that the search form and REST api also support query by formula, with some automatic sanitization of the formula on the backend.

merkys commented 7 years ago

Hi @shyuep. I have implemented search by Hill formula in our REST API. Now it accepts search parameter formula, viz.: http://www.crystallography.net/cod/result?formula=Co+Li+O2 (no quotes are required). Documentation is TBD.

merkys commented 7 years ago

In addition, we have implemented OPTiMaDe API (according to current frozen specifications): http://www.crystallography.net/tcod/optimade/info. To search for, i.e., Co Li O2, one can use http://www.crystallography.net/cod/optimade/structures?filter=chemical_formula=Co+Li+O2.