Performance with large databases

NLeSC / MAGMa

eMetabolomics project: Mass Annotation based on in silico Generated Metabolites

http://www.emetabolomics.org

Apache License 2.0

14 stars 6 forks source link

Performance with large databases #4

Closed ridderl closed 11 years ago

ridderl commented 11 years ago

Response of the webapplication starts to become quite slow for large databases (e.g. > 50000 molecules). Is there anything that can be done to make the queries faster?

sverhoeven commented 11 years ago

The slowest route is to fetch molecules (metabolites.json) which takes about 2.5s. The db query to fetch the molecules takes about 1.5s and counting the total nr of molecules takes about 0.8s. That db query is slow because it has to be joined with and grouped fragments for nr_scans and a grouped peaks for assigned

ridderl commented 11 years ago

OK, dat is goed om te weten. Omdat de nr_scans eigenlijk een vast getal is dat niet afhankelijk is van de query kan ik die ook heel makkelijk vooraf bijhouden en opslaan. Ik had daar zelfs al een kolom voor gereserveerd in models.py: metabolites.nhits. Dan hoef je die join met de grouped fragments niet te doen en misschien scheelt dat tijd?

Ik reken dit weekend de thee dataset opnieuw door waarbij ik de nhits kolom zal vullen. Misschien kan je dan komende week proberen of dit een versnelling oplevert?

sverhoeven commented 11 years ago

I tested it in sqliteman and the query time goes from 1.5s to 0.8s. By filling nhits column with:

UPDATE metabolites SET nhits= (
SELECT count(DISTINCT fragments.scanid)
  FROM fragments 
  WHERE fragments.parentfragid = 0 and fragments.metid =metabolites.metid
  GROUP BY fragments.metid
) WHERE EXISTS (SELECT metid FROM fragments WHERE fragments.metid =metabolites.metid);

When I also add 'assigned' column the speed increase is minimal.

ridderl commented 11 years ago

That’s good, isn’t it? I implemented filling of the nhits table already, so please implement this change in magma/web …

sverhoeven commented 11 years ago

Fetching molecules lowered from 2.5s to 1.5s

sverhoeven commented 11 years ago

Performance improved.