flotang-gtt / ThermML

2 stars 0 forks source link

Requirements list #4

Closed bocklund closed 4 months ago

bocklund commented 4 months ago

Another dump of some requirements I’d written up in 2021 for pycalphad-xml. Again, all open for discussion. #2 seemed like a specific issue, I’m not sure if it makes sense for this to be a meta issue that we break up into individual issues, or just try to discuss the initial set of requirements all in one issue. I’m happy to close this and move it somewhere else if it makes things easier to iterate.

Design goals/requirements

  1. The new format should be an interchange format that's machine readable and human readable

  2. Software implementations can use any format they like internally (for performance, etc.), as long as they can read/write to the interchange format

  3. Should be able to describe metadata like version history and give better references to related publications or other databases containing parameters, reference state data, general metadata, etc.

  4. Should still be relatively pleasant to edit manually and format in a way that is convenient enough to make small edits and modifications by hand

  5. Should make it easier to compare and combine databases

  6. Should have clear hooks for extensibility, e.g. new types of parameters or models

  7. Should give clear guidance to what data can be added in a parameter. The TDB may be too rigid, but we also don't want a wild west of complicated parameters/types

  8. Machine-readability should lend itself to automated data collection, indexing and retrieval. For example, it should be easy to expose the databases and metadata by an API or to aggregate statistics about a collection of databases.

  9. Support global phase identifiers and aliases (SIGMA_D8B, SIGMA, D8B all are valid local names that refer to the same phase)

  10. Should contain a mechanism that facilities optimizing variables

  11. Mechanism for including uncertainties or sensitivities? Like TDBX? Consider different representations, like the uncertainty being described by a closed-form distribution or MCMC-like samples from a posterior.

  12. Consider the possibility of interphase properties (e.g., interfacial energy). This requires the ability to describe linkages between one or more phases. This is already required for models like the two-phase order-disorder model, but it could potentially be implemented in a more robust way versus the "type definition" approach of the TDB format.

  13. Nice-to-have: Trivial merging/sub-setting of databases. This would require that all node elements be able to either copied verbatim, or excluded, from such an operation. Attributes could not contain information which would need to be modified for a combined system or sub-system. So, <Phase constituents="AL,ZN"> would not be allowed, but instead written as something like <Phase><Constituent subl="0"><Element ref="AL" /></Constituent> <Constituent subl="0"><Element ref="ZN" /></Constituent> </Phase>. This probably trades heavily against concise syntax.

  14. Chemical element reference states need first-class support. This also supports database merging by helping to surface reference state incompatibilities.

johanzietsman-em commented 4 months ago

I transferred all of Brandon's information into the requirements markdown files. I therefore closes this issue. We can open further issues to address more specific requirement-related matters.