derrickoswald / CIMSpark

Spark access to Common Information Model (CIM) files
MIT License
15 stars 1 forks source link

Joining two VertexPartitions with different indexes is slow. #3

Open derrickoswald opened 7 years ago

derrickoswald commented 7 years ago

In performing the network topology processing to create the TopologyNode and TopologyIsland RDD, numerous messages about joining VertexPartitions with different indexes is slow are logged:

WARN impl.ShippableVertexPartitionOps: Joining two VertexPartitions with different indexes is slow.

It is unclear which operations are causing these messages or if they can be avoided. Google searches indicate that the VertexRDD that are used should be cached, but preliminary attempts to cache generated VertexRDD had no effect. Other processing which may be at fault is to update the ACDCTerminal, and ConnectivityNode RDD including their superclasses.

An attempt should be made to understand the origin of these messages and ameliorate them if possible.