Haigutus / Energy-Reference-Data

Reference Data for Energy domain using SKOS
0 stars 2 forks source link

Manifest payload naming standard #39

Open Sveino opened 1 year ago

Sveino commented 1 year ago

CIM in general shall not include any naming standard. The main idea to include manifest and DCAT is to avoid implementation to rely on naming standard. However, as part of testing and where user interaction on the technical level is needed there is a need for a naming recommendation. The shall follow the same logic that cim:IdentifiedObject.mRID and cim:IdentifiedObject.name where mRID is the machine interpreted identification and name is the user "identification". A naming standard can also be useful for simple file based archiving tools that is based on the file name.

The updated naming standard need to cover the needs from CGMES profiles in CGM and TYNDP process in addition to the CSA, CCC, OPC and STA processes.

The current CGM names standard is document in: Quality of CGMES datasets and calculations v3.3 3.2 FILE NAME AND FILE HEADER The following mask is to be used to have a valid file name: (snip)

image

Example from QoCDC:

The item in the naming standard need to be found in the header so that tools can generate it based on an information model and that is consistent with the content of the payload.

<dcat:startTime>_<dcterms:publisher>_<prov:wasGeneratedBy>_[dcat:version]

: Taken from the dcat:Dataset - if there are multiple dataset with different startTime the prov:generateedAtTime for the manifest (collection) is used. : Taken from the dcat:Dataset - if there are multiple dataset with different publisher the publisher of the manifest (collection) are used. Taken from the dcat:Dataset - if there are multiple dataset with different wasGeneratedBy the wasGeneratedBy of the manifest (collection) are used. [prov:wasGeneratedBy](https://www.w3.org/TR/2013/REC-prov-o-20130430/#wasGeneratedBy) is an association to the abstract [prov:Activity](https://www.w3.org/TR/2013/REC-prov-o-20130430/#Activity) that produced the prov:Entity. The name include: - Process Type: CGM, TYNDP etc - Time Horizon: Year-ahead, Month-ahead etc - Run - Iteration - Profile E.g. for the following instance file the relevant activity are relevant: EQ/RA -> CGM, CGM1Y, TYNDP SSH/TP/SV -> IN, TYNDP, 1Y, 1M, 1W, 6...1D, ID RAS -> IN, TYNDP, 1Y, 1M, 1W, 6...1D, ID _[dcat:version]: This is referring to the dcat:Dataset where a new dcat:Dataset is replacing, make the previous version not valid any longer, by a new version that has the same validity period. The naming should follow semantic versioning, e.g. https://semver.org/ where _[1.0.0] is the default and is optional to use. Other version than the default must be included in the name. E.g. The same EQ is exchange for the TYNDP: - 20230101_APG_TYNDP-EQ.xml - 20230101_APG_TYNDP-EQ_[1.0.0].xml Example for CGM: - 20180118T0930Z_1D_APG_SSH_001.xml -> 20180118T0930Z_APG_CGM-1D-SSH.xml - 0180117T2230Z_1D_APG_EQ_001.xml -> 0180117T2230Z_APG_CGM-1D-EQ.xml - 20180117T2230Z__APG_EQ_001.xml -> 20180117T2230Z_APG_CGM-EQ.xml - 20180118T1130Z_1D_TSCNET-EU_SV_001.xml -> 20180118T1130Z_TSCNET-EU_CGM-1D-SV.xml - 20180118T1130Z_1D_TSCNET-EU-APG_SSH_001.xml -> 20180118T1130Z_TSCNET-EU-APG_CGM-1D-SSH.xml Example for TYNDP: - 20230101_APG_TYNDP-EQ.xml Example for CSA: - 20230512T2230Z_APG_CGM-RA.xml - 20230512T2230Z_APG_CGM-1D-r1-RAS.xml
Haigutus commented 1 year ago

I would propose a rule, that filename can contain only data that can be extracted from file header.

Reasoning:

  1. Currently some metadata is added to filename, that is not present inside the file and then the filename parsing becomes mandatory process. To avoid this in future we should force the rule and if additional metadata is needed, then first file header/manifest needs to be extended

  2. Filename can be automatically created at the moment of storage by extracting relevant metadata from the file header

Sveino commented 1 year ago

@Haigutus Yes, definitely - I was hoping this would come clear out of the text above. In the discussion with CSA it is clear that we need to have a name - may above proposal is based on this. Making sure that we can cover the current requirement. The next step would be to come up with a proposal that used our current header data.

Sveino commented 1 year ago

Updated above that the _[dcat:version] is referring to dcat:Dataset and not dcat:Distribution. It now refers to when a dataset is replaced by a new version with the same metadata, e.g. start and end validitiy period. dcat:version will follow semantic versioning, e.g. https://semver.org/.