Closed VladimirAlexiev closed 4 days ago
I have a way to do this. You can just put the data we want to multiply somewhere and I will multiply it
@griddigit-ci I've described them here https://github.com/Sveino/Inst4CIM-KG/tree/develop/rdf-improved#sample-instance-data
dataset | xml | zip | files | FullModel | triples | largest | largest file |
---|---|---|---|---|---|---|---|
ENTSO-E_Test_Configurations_v3.0.2 | 151M | 19M | 357 | 350 | 1844380 | 947208 | RealGrid/RealGrid-Merged/RealGrid_EQ.xml |
Nordic44 | 2.9M | 15 | 12 | 35481 | 17420 | CGMES_2_4/Nordic44_CGM_37a_EQ.xml |
@Sveino and @griddigit-ci
Nordic44 does have a better alignment with market result. The plan is to develop the more advance HVDC model in this. But currently there are no reason for including Nordic44 in regard to SHACL performance validation. Statnett's model is about 30MB zipped and 800MB unzip. I belive that there is a model of National Grid that can be used for real TSO model validation.
I did what we agreed last week. I took RealGrid and multiplied 100 times. On the was I faced some memory issues, but then tuning a bit the 64 GB RAM usage on my server machine it was possible to produce.
The data is here: https://1drv.ms/f/s!AhDObGm0xWObjJI3y0obO3j9L4TSRw?e=4CDbxL
RealGrid10 is 10 times multiply. There is 20 times, 50 times and 100 times I think I should be able to do even bigger, but not sure what the limit is. The EQ100 is 1.3 GB zip
in each of these grids 1 is the same as the original. So eventually you can also merge the 4 sets and you will get 177 times multiply effect.
Okay!
@griddigit-ci
The files use DOS line endings and maybe a byte-order mark (BOM).
BOM doesn't play well with riot
, so I remove the BOM and convert to Unix line endings:
d2u *
(This takes about 15 minutes because the files are large)
The files include
xml:base="http://iec.ch/TC57/CIM100"
which doesn't match other instance files, contradicts the decision to make base=MAS, and is inappropriate for base of instance URLs.
I'll remove this xml:base
The URL doesn't match cim:IdentifiedObject.mRID
. Will this cause validation errors?
<cim:CurrentLimit rdf:ID="_3951f2a5-2c07-4179-ac6e-bc66fbc8a13e">
<cim:IdentifiedObject.name>TATL</cim:IdentifiedObject.name>
<cim:OperationalLimit.OperationalLimitSet rdf:resource="#_cc5f27d6-22f1-43f8-abf8-8e6a8bbec470"/>
<cim:OperationalLimit.OperationalLimitType rdf:resource="#_882b38a5-ebe9-472b-9ab6-edce9a9ac0bb"/>
<cim:CurrentLimit.normalValue>500</cim:CurrentLimit.normalValue>
<cim:IdentifiedObject.mRID>74bd5853-6f94-49cb-a15d-331cc1a7dc29</cim:IdentifiedObject.mRID>
</cim:CurrentLimit>
Yes, you can remove the xml:base. Also that last line is OK to remove. On the mrid. It can cause erros, but not sure. It should not cause issues for now I guess. I will need to look at my functions. They were done for previous versions where we didn't have mrid and this is why this comes as not consistent
We think there's no rule to check that the node URI matches the mRID. Closing
We want to multiply instance data in order to generate large datasets for performance testing.
uuid_old
be the UUID of the existing resource,1<=n<=N
be a counter, anduuid_new
be the UUID of the replicated resource Then we can use the perl UUID::uuid3 function like this:$uuid_new = uuid3 (url => "$uuid_old$n")