Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
83 stars 37 forks source link

Include properties in response upon requests #184

Closed merkys closed 4 years ago

merkys commented 5 years ago

Due to some properties being either derived on the fly or very voluminous (for example coordinates), it might be useful to make certain properties excluded from responses by default, but added upon request. This could be implemented by reusing response_fields: if a field is excluded by default, but mentioned in response_fields, it MUST be present in the response.

rartino commented 5 years ago

Thinking a bit about this, is it better to formulate it this way?:

Both "existing" implementations complained about the cost of producing coordinate data for large amounts of entries when the user didn't explicitly need it, so it is my impression from our OPTiMaDe videocall that this needs to be addressed for v1.0. Hence, I'm marking it with the 1.0 milestone.

merkys commented 5 years ago

Thinking a bit about this, is it better to formulate it this way?:

* Some properties are marked in the specification as included by default.

* Other properties are included only when included in `response_fields`.

Agree. We already have attributes Requirements/Conventions for all the properties, thus we just need to agree on the set of properties which should have 'SHOULD NOT be present unless explicitly included'.

Both "existing" implementations complained about the cost of producing coordinate data for large amounts of entries when the user didn't explicitly need it, so it is my impression from our OPTiMaDe videocall that this needs to be addressed for v1.0. Hence, I'm marking it with the 1.0 milestone.

Thanks - I was about to bring this issue up myself.

rartino commented 5 years ago

We already have attributes Requirements/Conventions for all the properties, thus we just need to agree on the set of properties which should have 'SHOULD NOT be present unless explicitly included'.

My point here was that we may want to instead agree on the set of properties that are present "by default" in the response. And then have general text that says that all others "SHOULD NOT" be present unless explicitly included.

merkys commented 5 years ago

My point here was that we may want to instead agree on the set of properties that are present "by default" in the response. And then have general text that says that all others "SHOULD NOT" be present unless explicitly included.

Sounds reasonable.

From the point of view of COD, all current fields could by present by default except:

CasperWA commented 5 years ago

This all seems like a very valuable addition to the specification indeed.

From the point of view of COD, all current fields could by present by default except:

* `cartesian_site_positions`

* `nsites`

In which case will nsites be an issue to communicate? Is it because you're first calculating cartesian_site_positions before calculating its length as the value of nsites. In my opinion, if cartesian_site_positions is not provided by default, at least nsites should be, so one has an inkling of how costly it may be to request cartesian_site_positions? But again, if its unavoidable to calculate also cartesian_site_positions before getting the nsites value, I can see why you put it here.

* `species_at_sites`

* `species`

* `assemblies`

For all others it can make sense :+1:

merkys commented 5 years ago

In which case will nsites be an issue to communicate? Is it because you're first calculating cartesian_site_positions before calculating its length as the value of nsites.

Exactly. In the general case to us there is no better way to get the value of nsites.

rartino commented 5 years ago

@merkys if I understand this correctly, the issue for you is that your database uses non-equivalent atom form, right?

So, doesn't this form require the multiplicity of each non-equivalent site to be given? Or, at least, the Wyckoff name from which the multiplicity follows? So, while I understand that generating the full cartesian_site_positions is expensive, would not generating nsites be considerably cheaper? I.e.:

nsites = 0
for every non-equivalent site:
  nsites += multiplicity*occupancy

Edit: also taking account for sites without full occupancy in the pseudocode.

merkys commented 5 years ago

It's true, though both multiplicities and Wyckoff positions tend to be absent in CIF files. Moreover, not always these pieces of data can be trusted. We have observed that different programs use the same multiplicity fields to store ontologically different data.

rartino commented 5 years ago

Ok, thanks @merkys, thas is a fair point for not including nsites in the default response. I agree that we want the transformation of "usual cif files" into the default set of properties to be cheap.

Who writes up the PR? It needs to update the text about what is included in the response, and then in the relevant property subsections of section 6 indicate for the small set of "default" properties that they are part of that set.

merkys commented 5 years ago

Who writes up the PR? It needs to update the text about what is included in the response, and then in the relevant property subsections of section 6 indicate for the small set of "default" properties that they are part of that set.

I can take it.