derrickoswald / CIMSpark

Spark access to Common Information Model (CIM) files
MIT License
15 stars 1 forks source link

CIM 100 #8

Open derrickoswald opened 6 years ago

derrickoswald commented 6 years ago

The CIMreader is currently coded for CIM 17 or rather one specific combination of CIM17 (iec61970cim17v34_iec61968cim13v12_iec62325cim03v17a.eap) that has been labeled CIM100.

For generality, the CIMReader needs to be able to:

derrickoswald commented 5 years ago

The CIM100 model has been created from CIMTool and committed to the master branch, so it is now the de-facto default.

derrickoswald commented 4 years ago

One way to provide various CIM version models (where model == Java jar compiled from Scala source files) is to create a Maven package for each version.

So, the Maven coordinates for CIM100 might be:

ch.ninecode.cim.model:CIM100:2.11-2.4.5-4.1.4

(Note: the package was previously called simply ch.ninecode.model)

Ignoring the "conversion between versions" use case, the model to be used by the CIMReader, and other programs such as CIMExport, could be specified on the command line for spark-shell and spark-submit with the --packages ch.ninecode.cim.model:CIM100:2.11-2.4.5-4.1.4 option.

If none is specified, it could be classed as an error, but then the user would need to know maven coordinates and the version of the file by looking at the CIM namespace within the file.

A better approach would be to have some sort of default, or better yet, to examine the namespace in the header of the file (or files) and use a heuristic mentioned in this issue description to load the correct jar.

Now, where to get the jar? One approach is to hard code some locations, just like Spark does, but to use the specific jar location. For example:

https://repo1.maven.org/maven2/ch/ninecode/cim/model/CIM100/2.11-2.4.5-4.1.4/CIM100-2.11-2.4.5-4.1.4.jar https://dl.bintray.com/spark-packages/maven/ch/ninecode/cim/model/CIM100/2.11-2.4.5-4.1.4/CIM100-2.11-2.4.5-4.1.4.jar

derrickoswald commented 4 years ago

The CIMReader can (I think) fairly easily handle underlying class changes if the CHIM.scala package delegates the ClassInfo list (CHIM.classes) to the specific module (something like a ClassList object) - assuming that Element, BasicElement and Unknown classes are retained in the CIMReader since it uses the abstract methods on Element. Obviously, to avoid circular references, it can depend on none of the specific version of CIM classes modules.

The preprocessors (CIMAbout, CIMNormalize, CIMDeDup) only depend on Element, so they are CIM version independent by definition.

The difficulty then would be to produce CIM version independent post processors (CIMEdges, CIMNetworkTopologyProcessor, CIMJoin) that are pretty dependent on a specific version of the CIM classes. CIMExport as a standalone module also depends on a specific version of the CIM classes.

It would be easier if the CIM classes didn't need to extend Row which is Spark version dependent. I wonder if it's possible to get rid of that requirement.