Spyderisk / system-modeller

Spyderisk web service and web client
Other
4 stars 4 forks source link

Palette generation is too slow #104

Open mike1813 opened 1 year ago

mike1813 commented 1 year ago

When a domain model is deployed to system modeller (including an initial deployment), system modeller has to process the domain model bundle (a zipfile containing the domain model NQ file, a set of icons, and a mapping from asset types to icons). This takes an annoyingly long time.

The problem is especially noticeable to domain model developers, who need to upload successive changes to test new features, and potentially revert to old versions for regression testing, etc. Waiting many minutes on each occasion can add up to a lot of lost time.

The problem seems to be in the generation of links. I assume the code is creating a complete list of (resource asset type)-(relationship type)-(target asset type) relations that users should be able to create, which would involve using inheritance to find all combinations.

Ideally this area of the code should be optimised before we try to involve more people in domain model development.

mike1813 commented 4 weeks ago

It is evident from the logs that the excessive calculation time is in method PaletteGenerator.getLinks().

I did a bit more investigation. It seems that all the time spent in this method is used to create the list of relations, i.e., in the call

List<Map<String, String>> relations = modelObjectsHelper.getPaletteRelations(domainModelGraph);

This call runs a big lump of SPARQL. Superficially, it looks OK at the start:

SELECT DISTINCT ?from ?prop ?label ?comment ?to (STR(?hidden) AS ?isHidden) WHERE

where ?prop is a relationship type, ?from and ?to will be asset types between which this relationship can exist, ?label and ?comment will contain strings used to display the relationship or explain it (in a tool tip), and ?hidden determines whether it should be assertible by users. I assume the recipient process needs this boolean value expressed as a string ?isHidden.

Note that core:isHidden is not the property that should be used to determine if a relationship type is assertible. There is a separate property in the domain model for this called core:isAssertable (a misspelling, but I would have thought it was fairly obvious what that means). Should be an easy fix, but that isn't related to the performance issue.

Where things seem to go wrong is in the WHERE clause. It starts in what looks like a reasonable way:

?prop a owl:ObjectProperty .
?prop rdfs:domain ?domain .
?prop rdfs:range ?range .
?prop rdfs:label ?label .
?prop rdfs:comment ?comment .
OPTIONAL {?prop core:hidden ?hidden }

which extracts the relationship type, label and comment and the defined domain and range of the relationship. However, this isn't used directly as the ?from and ?to values, presumably because system-modeller wants every possible asset type that could be in either of these. Both ?from and ?to are found from a set of constraints which have the same form at both ends. The constraints for ?from are:

{
  ?dasset rdfs:subClassOf* core:Asset .
  ?domain rdfs:subClassOf* ?dasset .
  ?from rdfs:subClassOf* ?domain .
} UNION {
  ?domain owl:unionOf ?du .
  ?du (rdf:rest)*/rdf:first ?da .
  ?from rdfs:subClassOf* ?da .
}

This seems over complex. All we really need is that ?from should refer to any (direct or indirect) subclass of ?domain including ?domain itself. It isn't clear why we insist that ?domain is an owl:unionOf anything, but doing this may trigger a lot of unnecessary list processing. The constraint that ?domain rdfs:subClassOf ?dasset where ?dasset rdfs:subClassOf core:Asset also makes no sense - it says ?domain is a (direct or indirect) subclass of any direct or indirect subclass of core:Asset - which is already any asset. That seems like running a join between two small tables by creating joins between each small table and another, bigger table (every class of asset).

I am not sufficiently familiar with SPARQL and RDF to fix this. I suspect we could get away with something like:

`` { ?from rdfs:subClassOf* ?domain . }



and make a similar simplification for the relationships of ?to and ?range.

Can anyone explain why this wouldn't work?