Open stefjoosten opened 3 years ago
I don't think there is a real issue here considering all this is meant to run prototypes. It may be different if one were to take the step to production.
Here is my argumentation: First, I think that in prototypes, only the concept DateTime
and the relation lastAccess[SESSION*DateTime
suffer from such large populations. The reason is that in the log files, I see no other tables than DateTime
and lastAccess
being populated all the time. I would have noticed that, and I would also have tried to color these parts of the log files grey so it would not distract developers from the 'real' populations they want to see:
I guess it would become a problem if memory consumption became too big. That might also be a signal to a developer to get a real computer, as databases should be able to cope with lots of data. Another problem might be that operations such as ;
or <>
might need too much time. That could be(come) very real, but afaik, the relation lastAccess
isn't used by anyone but me, because developers do not know it exists. I use it to generate error messages that show on the screen for a limited time; with a time-out (saving the value of lastAccess
when showing the error message on the screen, and comparing it with the current value of lastAccess in subsequent exec-engine runs allows me to show the message for at least (say) 5 seconds, after which the next interaction will erase it.
Hi @hanjoosten, I would like to reopen this issue and discuss a few options here first. I do recognize the problem addressed here. Having all atoms with scalar types stored in a concept table also has performance issues regarding amount of conjuncts checked.
I need some more time to formulate a proposal and think it's valuable to discuss that in person with you and/or @stefjoosten first.
After discussion with @hanjoosten we decided to see if a solution is feasible. We think it is, with limited effort.
A proposal will follow soon.
So when is a concept table used? For this, we have to analyze terms, which are the only source of using concept tables. If a term contains a subtermI[C], the compiler generates a query that uses the concept table of C.
This sounds good. However, is this true in general? If so, it would be ease to split all concepts into two categories. Ones that do and ones that don't need a concepttable. The TTYPE of such concept is irrelevant to this splitting. @sjcjoosten , do you agree with @stefjoosten on this?
V
and I
both use concept tables, and are as far as I know its only source, although the concept tables don't always have to be used whenever a V
or I
is used: in expressions like r;V;s
or r /\ I
, we don't even need a concept table. As a first step, we can choose to disregard this and treat such expressions as using concept tables. However, that means that the performance issues regarding amount of conjuncts checked might not change (as the code that checks the conjunct in itself may introduce the concept tables).
Note that the Epsilon relations might stand in for I
as well.
I'm not sure at what point in the compiler we can be aware of 'all expressions', and there are several caveats here:
I'm afraid that rules that call the exec-engine might refer to expressions that we might miss. I don't know the input language for the exec engine, but Michiel does, so he might be able to tell us whether this change might potentially make the writing of exec-engine rules any harder than it already is.
Input language for the exec engine are Ampersand expressions in the violation statement of these rules. The exec-engine, nor the prototype framework itself works with expressions. Only generated output, including SQL, from the compiler. Therefore, I don't see any issues here.
We do however need to adapt the prototype framework to such a change, because the code needs to be aware if there is a concept table that needs to be (de)populated when the user is interacting with the application. That info needs to come from the compiler. In the generated output file concepts.json
we could do that as follows:
Current output for e.g. a DateTime concept:
[{
"id": "DateTime",
"label": "DateTime",
"type": "DATETIME",
"defaultViewId": null,
"conceptTable": {
"name": "DateTime",
"cols": [ "DateTime" ]
},
"directGens": [],
"directSpecs": [],
"generalizations": [],
"specializations": [],
"largestConcept": "DateTime",
"interfaces": [],
"affectedConjuncts": [ "conj_123", "etc" ]
}]
will become:
notice the null
for the conceptTable
[{
"id": "DateTime",
"label": "DateTime",
"type": "DATETIME",
"defaultViewId": null,
"conceptTable": null,
"directGens": [],
"directSpecs": [],
"generalizations": [],
"specializations": [],
"largestConcept": "DateTime",
"interfaces": [],
"affectedConjuncts": [ "conj_123", "etc" ]
}]
I think that will do.
Problem
The following screenshot shows that some relations, e.g.
datetime
, have excessive amounts of instances: From a practical point of view, there is no reason to keep such data stored in the database. It is unsure whether this leads to performance problems, which we won't analyze in the current issue. This issue is weakly related to issue #560.Analysis
At the moment all atoms are stored in a concept list. For atoms like
DATETIME
that are stored in the concept tableDateTime
, this is clearly excessive. So not every concept needs a concept list in the database. But which? Of which concepts can we prove at compile-time that they don't need a concept table? (sidestep: The Haskell functionsqlConceptPlug :: FSpec -> A_Concept -> PlugSQL
gives us the concept table in the compiler.)So when is a concept table used? For this, we have to analyze terms, which are the only source of using concept tables. If a term contains a subterm
I[C]
, the compiler generates a query that uses the concept table ofC
.