RFC: cleaning and standardizing JSON-schemas naming

pamfilos commented 8 years ago

Creating/moving issue here for comment by @tiborsimko in PR #188 :

Shall we take this occasion and also clean names so that the full URI would be stable?

E.g. this PR introduces:

jsonschemas/records/CMSAnalysis-v0.0.1.json

which seems nicely "namespaced" to CMS, but also:

/jsonschemas/definitions/workflow_schemas/yadage/scheduler/parameterselection-v0.0.1.json

where the namespace is less clear from the name only (parameterselection-v0.0.1.json) and one has to rely on the preceding path. (The namespace could be "yadage" here, but it's kind of "less visible" in the middle.)

Option 1: we could use flat directory structure and put namespaces in file names only:

cap-basic-metadata-v1.0.0.json
cap-keywords-v1.0.0.json
cms-main-measurement-v1.0.0.json
lhcb-main-measurement-v1.0.0.json
adage-parameter-selection-v1.0.0.json
lhcb-something-other-things-v1.0.0.json

Option 2: we could use directories and let experiments name the schemas as they see fit:

lhcb/main-measurement-v1.0.0.json
cms/main-measurement-v1.0.0.json

(I guess some file name prefix is nice to have, because if we always rely on directory location, then we may enter troubles when two files have the same names. It could be error-prone.)

Option 3: combine the above, and allow nested directories:

cms/measurements/main-measurement-v1.0.0.json
agage/something/parameterselection-v1.0.0.json

jirikuncar commented 8 years ago

I would try to use following structure:

|- jsonschemas/
|  |- definitions/
|  |   |- basic-metadata-v1.0.0.json
|  |- records/
|  |  |- atlas/
|  |  |   |- main-measurement-v1.0.0.json
|  |  |- cms/
|  |  |- lhcb/

jsonschemas/<system type (records, deposits, files, ...)>/<experiment>/<document type>-v{version}.json

all lowercased
separator - dash/minus-
do not repeat words in directory and file name (e.g. /measurements/main-measurement-v1.0.0.json -> /measurements/main-v1.0.0.json)
append semantic version number (vX.Y.Z)
use json extension

Kjili commented 8 years ago

The complexity that I see is that we have three levels:

<Collaboration>-Analysis <Collaboration>-Analysis-Segments "Collaboration-Unspecific"-Analysis-Segments

Building on @jirikuncar's structure and @tiborsimko's suggestions we could have something like:

|- jsonschemas/
|  |- definitions/
|  |  |- basic-metadata-v1.0.0.json ("Collaboration-Unspecific"-Analysis-Segments)
|  |- records/
|  |  |- atlas/
|  |  |  |- ATLASAnalysis-v0.0.1.json (<Collaboration>-Analysis)
|  |  |  |- measurements/
|  |  |  |  |- main-v1.0.0.json (<Collaboration>-Analysis-Segments)
|  |  |- cms/
|  |  |- lhcb/

We should discuss whether this structure rips the "Collaboration-Unspecific"-Analysis-Segments too far away from the other Analysis-related jsonschema definitions and how we might decrease this gap for better closure and intuitiveness.

jirikuncar commented 8 years ago

@Kjili with only one comments/ATLASAnalysis/analysis/.

Kjili commented 8 years ago

Another possibility would probably go better with the options/ folder content and the meaning behind records/:

|- jsonschemas/
|  |- definitions/
|  |  |- basics/
|  |  |  |- metadata-v1.0.0.json ("Collaboration-Unspecific"-Analysis-Segments)
|  |  |- atlas/
|  |  |  |- measurements/
|  |  |  |  |- main-v1.0.0.json (<Collaboration>-Analysis-Segments)
|  |  |- cms/
|  |  |- lhcb/
|  |- records/
|  |  |- ATLASAnalysis-v0.0.1.json (<Collaboration>-Analysis)

For my intuition this does not result in as wide a gap between the different segments and the main analysis schema.

It might not be the best solution though if we start having ATLASAnalysis, ATLASWorkflows, ATLASx, ATLASy, ... and the same for ALICE, CMS and LHCb. Unless we start atlas/, alice/, ... folders inside the records/ folder.

jirikuncar commented 8 years ago

@Kjili keeping similar folder structure in records/ (atlas/, alice/, ...) seems more logical. Again, I would highly recommend using only lowercased filenames. It should be also preferable to use directories instead of file prefixes (e.g. ATLASAnalysis ->atlas/analysis`).

Kjili commented 8 years ago

While we are at it we should also change the file names themselves to represent the new naming scheme, e.g. CMSFinalInputCodeOutput -> cms/final-results or cms-final-results respectively and remove the files that are outdated.

suenjedt commented 8 years ago

@tiborsimko what do you think?

tiborsimko commented 8 years ago

@suenjedt I basically expressed my thoughts above. The most important question to me is whether the collaborations might want to use some names already based on their current vocabulary practices, or whether we are going to maintain these names ourselves. E.g. adage schemas may already be used outside, so we may have less liberties to "compress" them into some standard form.

If we have all the liberties, I'd go for a very simple flat structure like <namespace>-foobar-v1.0.0.json which seems more flexible than inventing some nested directory hierarchies, in case we decide to reorganise, move, or otherwise amend "measurements" and friends later. (This would be option 1 above.)

ArtemisLav commented 8 years ago

Closing this, since everybody seems to be happy with how things are right now.

jirikuncar commented 8 years ago

since everybody seems to be happy with how things are right now.

That's a bit strong statement. There are good reasons behind good directory structure that will make it easier to create correct ES indices in the future.

ArtemisLav commented 8 years ago

I can reopen it then; it was closed after being discussed in today's meeting.

cernanalysispreservation / analysispreservation.cern.ch

RFC: cleaning and standardizing JSON-schemas naming #189