daveklein / snofyre

Automatically exported from code.google.com/p/snofyre
0 stars 0 forks source link

Java heap explosion on query execution with non-trivial patient numbers #29

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Generate 4000 patients with condition X
2. Query for patients with X

What is the expected output? 

A list of 4,000 patients

What do you see instead?

A java heap explosion

Original issue reported on code.google.com by jeremy.r...@googlemail.com on 24 Jan 2011 at 11:35

GoogleCodeExporter commented 9 years ago
Jeremy, I assume this is being caused when you have 4000 records with some type 
of anemia and you are running your query for different types of anaemia? I've 
attached a screenshot...

I've looked at your query and it has 10 disjunctive sub queries. The query 
execution algorithm currently is only designed for accuracy, not performance. 
Since the query execution algorithm is not optimised in any way at the moment, 
it in effect becomes 4000 x 10 = 40000 checks against rows! There are many 
different ways of optimising for such queries...

1. Check all contained subqueries against every row in the same go! Saves 
having to reload 10 times!
2. Re-express query to merge disjunction within values of an attribute (look 
for anemia due to = chronic_disease OR renal_disease) in a single query, 
instead of splitting into two as it currently done!

I don't see this issue being easy to resolve by any one single generic 
optimisation. 

Original comment by jay.kola on 24 Jan 2011 at 11:58

Attachments: