Capitains / Nautilus

Implementation of a local CTS5 endpoint for MyCapytains
http://capitains-nautilus.readthedocs.io/en/latest/
Mozilla Public License 2.0
6 stars 4 forks source link

Sparql performance improvement ? #67

Closed PonteIneptique closed 3 years ago

PonteIneptique commented 6 years ago

Ok, I had to check some stuff :

Here is the performance of the resolver (without cache) :

from capitains_nautilus.cts.resolver import SparqlAlchemyNautilusCTSResolver
from capitains_nautilus.cts.resolver import NautilusCTSResolver
from MyCapytain.common.constants import Mimetypes

timeit = 100
resolver = SparqlAlchemyNautilusCTSResolver(
    ["./tests/testing_data/latinLit2"],
    graph="sqlite:///2.sqlite"
)
resolver.parse()

print("Parsed 1")
from time import time

current = time()

for _ in range(timeit):
    resolver.getMetadata().export(Mimetypes.XML.CTS)
now = time()

print("{timeit} operations in {sub} : {opsec} sec/op".format(
    timeit=timeit,
    sub=now-current,
    opsec=(now-current)/timeit
))

Obviously, both of them would be cached at the HTTP serving but this seems to be so much of a loss... I need to do more research about it as this benchmark does not take into account:

Stillm, there is an obvious need to improve this performance.

PonteIneptique commented 6 years ago

Performance bottlenecks ordered by deepest units with own time image

Performance bottlenecks ordered by time : image

PonteIneptique commented 6 years ago

Nevertheless, the changes made to MyCapytain were worth it, as it will allow Nautilus to keep up to date with the original system while making some improvement, but also allow to run a graph store on top of it for real sparql query.

PonteIneptique commented 6 years ago

Few ideas in improvement : Have a "In-Memory" cache of some data using some kind of SparqlGenerator singleton (thanks to @MrGecko for the idea)

class InMemorySparqlCache(object):
  def __init__(self, cache=None):
    self.generated = {}
  def generate_textgroup(self, identifier, *args, **kwargs)
    if identifier not in self.generated:
      self.generated[identifier] = self.classes["textgroup"](identifier, *args, **kwargs)
    return self.generated[identifier]

This should be implemented on top of in memory metadata caching for SparqlCollection objects

PonteIneptique commented 3 years ago

Benched with #91