schema repo validation - Githubissues

linepouchard commented 6 years ago

Here is the text of emails exchanged in June about schema repo and validation: yes, I think that is right, though bluesky is so flexible that it may also be doing "experiments" in a single ScanPlan (e.g., scanning over dots on a chip). So bluesky extends upwards (finally a reason why it is called bluesky), but a minimal bluesky scan such as ct() or a step scan is a "measurement" I would say. This means, as you say, (and if we agree on the structure I have proposed) recursing down through samples. to validate.

btw, the secret "vision" is actually to make all scientific experiments in the whole world on the databroker/bluesky model, but let's not get ahead of ourselves.....

On Mon, Jun 25, 2018 at 10:11 AM Allan, Daniel dallan@bnl.gov wrote: OK, follow-up question on (1) -- Is there a vision for how this makes contact with the bluesky/databroker document model? I guess a "Measurement" maps to what bluesky calls a "run"? So, if bluesky validates metadata is correct before initiating data acquisition, it is looking to validate at the Measurement scale, recursing down the tree to check a measurements' Samples, Materials, Phases, and Elements are all specified according to the schema. Are we on the same page?

Dan

Daniel B. Allan, Ph.D Associate Computational Scientist, Brookhaven National Lab (631) 344-3281 (no voicemail set up) From: Simon Billinge [sb2896@columbia.edu] Sent: Monday, June 25, 2018 10:02 AM To: Allan, Daniel Cc: Christopher Wright; Campbell, Stuart; Pouchard, Line; Van Dam, Hubertus; Juhas, Pavol; Sabrina Hernandez; Stavitski, Eli; Kleese Van Dam, Kerstin

Subject: Re: Give feedback on metadata schema Thanks Dan, this is what we need to get started.

Re your question 1, Sabrina is working on everything on the sample-side, up to the actual experiment. I proposed a rather highly nested list of collections/documents in my last email and that could be a starting point. She would work on everything up to sample in that tree. To bring you up to speed (you weren't at the last meeting) users are entering sample info in excel spreadsheets at XPD, so this would replace that step of the process in a way that is designed for optimal downstream analysis and provence capture (and even reuse of data).

Re question 2, "are we basing it on a previous standard" I guess the answer is yes if there is one that works for us. For sure we should reuse key-names that are in the standards. I think nexus has keys for structural things that are almost certainly also in cif. I am not sure if nexus reused cif or made their own, so sabriina can check things like that, but if there is a clash, I would prefer that we use cif for what cif covers, then nexus for what we need that cif doesn't cover, then.....not sure but we can discuss. Does anyone know of an open elements DB that is accurate and has an API? We shouldn't need to build that ourselves. We can also use structural DBs I think MP stores structures using cif but we should check as we want to interface to them.

linepouchard commented 6 years ago

Framework: 1) Stuart/Dan point us to the infrastructure you are using for schema validation 2) Chris clone it in provenance 3) Sabrina learn the workflow for cloning and testing changes to the schema locally on her computer (and the git workflow in general) 4) Sabrina learn the workflow for committing a change to the upstream schema and have the CI run a test validation 5) Simon, Chris, Eli work on some UCs. These will determine which keys we want to validate against. Where/how will we store these? 6) For the keys names, decide which standard we want to use (Stuart, Simon, Eli)

stuartcampbell commented 6 years ago

I responded a while ago about the validation. I've just checked with @CJ-Wright and he knows the details. We just use jsonschema.

sbillinge commented 6 years ago

I think we can close this issue. Any objections?

NSLS-II / sciprovenance

schema repo validation #3