CodyKochmann / graphdb

sqlite based graph database for storing native python objects and their relationships to each other
MIT License
190 stars 10 forks source link

I think sqlite should be taken out of the equation when running in ram. #11

Closed CodyKochmann closed 5 years ago

CodyKochmann commented 6 years ago

Ive been considering pulling sqlite from its in memory mode since taking advantage of python's namespace system would be wayyyyyyyyy faster and take less memory. It would be a ram only solution, but god if thats not a well tested backend I don't know what is Plus then the pointers are literal pointers instead of ints in some table so id figure the speed gains from stuff like that would be massive. I'm 99% sure the object de-duplication in python is them using graph logic underneath anyways. And recursive queries transfers directly to recursive generators which as we've found out blow away algorithmic benchmarks way better than we ever thought they would.

CodyKochmann commented 6 years ago

Here's the first steps in making a pure python in ram implementation.

import dill
from hashlib import sha256
from base64 import b64encode as b64e
from uuid import uuid4

def better_hash(obj):
    ''' like hash, just works with unhashable objects too '''
    try:
        return hash(obj)
    except TypeError:
        return int.from_bytes(
            sha256(
                dill.dumps(
                    obj,
                    protocol=dill.HIGHEST_PROTOCOL
                )
            ).digest(),
            'little'
        )

class RelationshipDict(object):
    """youll never guess, but this is a dictionary of relationships,
        just a little bit faster for graph based activities ;) """
    def __setitem__(self, key, value):
        assert isinstance(key, str)
        assert isinstance(value, GraphNode)
        if key in self:
            self[key].add(value)
        else:
            dict.__setitem__(self, key, set([value]))

class GraphNode(object):
    """twisty twist, its a node that goes in a graph"""
    __slots__ = 'value', '_hash', 'relationships'

    def __init__(self, value):
        self.value = value
        self._hash = better_hash(value)
        self.id = uuid4().int
        self.relationships = RelationshipDict()

    def __hash__(self):
        return self._hash

    def __eq__(self, target):
        return self.value.__eq__(target)

Still a lot to do, but it's a first step.

CodyKochmann commented 5 years ago

the sqlite version is still gonna stick around but more be used as the first of what might be many different storage backends