Mari-Wie / ArchCNL

ArchCNL is an Architecture Conformance Checking Tool developed by the working group SWK (Software Engineering and Construction Methods) at the computer science department of the University of Hamburg
GNU General Public License v3.0
5 stars 8 forks source link

OutOfMemoryError: Java heap space #149

Open vhschlenker opened 3 years ago

vhschlenker commented 3 years ago

When analyzing multiple big projects (like the cwa verification, testresult and regular server and the android app) the model gets too big to be written with a heap size of 4 GB. This can be circumvented with arguments like -Xmx8G but could lead to problems when running in CI-environments where resources could be limited.

The stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.HashMap.newNode(HashMap.java:1888)
    at java.base/java.util.HashMap.putVal(HashMap.java:642)
    at java.base/java.util.HashMap.put(HashMap.java:612)
    at org.apache.jena.reasoner.rulesys.impl.BindingVectorMultiSet.put(BindingVectorMultiSet.java:161)
    at org.apache.jena.reasoner.rulesys.impl.BindingVectorMultiSet.add(BindingVectorMultiSet.java:91)
    at org.apache.jena.reasoner.rulesys.impl.RETEQueue.fire(RETEQueue.java:105)
    at org.apache.jena.reasoner.rulesys.impl.RETEQueue.fire(RETEQueue.java:128)
    at org.apache.jena.reasoner.rulesys.impl.RETEClauseFilter.fire(RETEClauseFilter.java:227)
    at org.apache.jena.reasoner.rulesys.impl.RETEEngine.inject(RETEEngine.java:492)
    at org.apache.jena.reasoner.rulesys.impl.RETEEngine.runAll(RETEEngine.java:474)
    at org.apache.jena.reasoner.rulesys.impl.RETEEngine.fastInit(RETEEngine.java:163)
    at org.apache.jena.reasoner.rulesys.FBRuleInfGraph.prepare(FBRuleInfGraph.java:471)
    at org.apache.jena.reasoner.BaseInfGraph.requirePrepared(BaseInfGraph.java:530)
    at org.apache.jena.reasoner.rulesys.FBRuleInfGraph.findWithContinuation(FBRuleInfGraph.java:557)
    at org.apache.jena.reasoner.rulesys.FBRuleInfGraph.graphBaseFind(FBRuleInfGraph.java:587)
    at org.apache.jena.graph.impl.GraphBase.find(GraphBase.java:255)
    at org.apache.jena.graph.GraphUtil.listPredicates(GraphUtil.java:64)
    at org.apache.jena.rdf.model.impl.ModelCom.listPredicates(ModelCom.java:991)
    at org.apache.jena.rdf.model.impl.ModelCom.listNameSpaces(ModelCom.java:1004)
    at org.apache.jena.rdfxml.xmloutput.impl.BaseXMLWriter.addNameSpaces(BaseXMLWriter.java:219)
    at org.apache.jena.rdfxml.xmloutput.impl.BaseXMLWriter.setupNamespaces(BaseXMLWriter.java:488)
    at org.apache.jena.rdfxml.xmloutput.impl.BaseXMLWriter.write(BaseXMLWriter.java:470)
    at org.apache.jena.riot.adapters.AdapterRDFWriter.write(AdapterRDFWriter.java:56)
    at org.apache.jena.riot.adapters.RDFWriterRIOT.write(RDFWriterRIOT.java:83)
    at org.apache.jena.rdf.model.impl.ModelCom.write(ModelCom.java:351)
    at org.archcnl.toolchain.CNLToolchain.combineArchitectureAndCodeModels(CNLToolchain.java:263)
    at org.archcnl.toolchain.CNLToolchain.execute(CNLToolchain.java:175)
    at org.archcnl.toolchain.CNLToolchain.runToolchain(CNLToolchain.java:116)
    at org.archcnl.toolchain.CNLToolchainCLI.run(CNLToolchainCLI.java:59)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1919)
    at picocli.CommandLine.access$1200(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
vhschlenker commented 3 years ago

As I added a few mappings to my rules, I discovered that even 8g does not seem to be enough anymore. But even after playing around a bit with different options, like changing the output format (RDFFormat.TURTLE_FLAT was proclaimed as a bit easier on the memory) or using other means to write the model, I found no other solution.

At least I learned, that the problem lies in the conversion to XML (or whatever output format is set). The creation of the united model, or the mapping are running smoothly. But to create an XML to write, it seems that the entire model has to be somehow converted in memory.

Using the Stardog jena library to avoid writing the model to disk also run into memory issues.

The only solution I see, beside adding more memory, is to avoid any form of conversion. This means, that the output model would not be written, and that the conformance checking should happen without stardog.

It might be possible to optimize the XMLWriter from jena or relive some pressure there by prebuilding the model, but I have yet to found a way how to do that.

Edit: Requesting a deduction model from the created inference model, or just requesting the size of the inference model also runs OutOfMemory. The united model, on which the reasoner is applied and from which the inference model is created, currently has 84548 statements, and I don't know if that is "too much" or if the problem lies somewhere else. Manually limiting the rules in the reasoner seems to help, but I did not find one mapping rule which could be a problem.