MongoEngine / mongoengine

A Python Object-Document-Mapper for working with MongoDB
http://mongoengine.org
MIT License
4.24k stars 1.23k forks source link

dynamically switching db (when dbs are created dynamically, so cannot use aliases) #1986

Open anekix opened 5 years ago

anekix commented 5 years ago

i will try to describe my usecase here: i am using a multi tenant db model in which each client has their own database( due to very specific business use case). suppose i have to get the users details for a client x that has its own db named x, these are the steps i need to execute to get the data:

all of the above operations are to be executed in a single request to get the required results.

basically i need to map a Document class to multiple dbs which are created dynamically(as clients register) so i cannot hard code this in the document definition.

i guess the default behavior of mongoengine is to bind each Document to a specific database as soon as the code defining the document is executed, but we somehow need to bind documents to a db's dynamically.

other issues that are almost related:

1610

any specific architectural reason to bind a document to a database at the first execution of document?

bagerard commented 5 years ago

The switch_db context manager was built for this use case, is there anything preventing it use?

anekix commented 5 years ago

@bagerard switch_db can be used only when db aliases are predefined when connection is established for the first time (Taken from here). But in case of multi-tenant architecture where dbs cannot be predefined but created at runtime. ex. a new company database is created for when a company is registered (the architecture itself cannot be changed due to some domain requirements/ compliance)

bagerard commented 5 years ago

I have the impression that there might be a workaround if you establish the connections on the fly and use switch_db. Assuming you are naming databases names with a predictable name (e.g organization name):

from mongoengine import *
from mongoengine.context_managers import switch_db

orgs = ['org1', 'org2', 'org3']

conn = connect()    # establish a default connection

class MyDoc(Document):
    name = StringField()

    def __repr__(self):
        return 'MyDoc name: {}'.format(self.name)

# Save 1 doc per database
for org in orgs:
    print('establishing connection to db "{}"'.format(org))
    connect(db=org, alias=org)
    with switch_db(MyDoc, org):
        print('Saving in {}'.format(org))
        MyDoc(name='save_me_in_{}'.format(org)).save()

establishing connection to db "org1" Saving in org1 establishing connection to db "org2" Saving in org2 establishing connection to db "org3" Saving in org3

# Print what was saved
for org in orgs:
    with switch_db(MyDoc, org):
        print('Doc from db "{}": {}'.format(org, list(MyDoc.objects())))

Doc from db "org1": [MyDoc name: save_me_in_org1] Doc from db "org2": [MyDoc name: save_me_in_org2] Doc from db "org3": [MyDoc name: save_me_in_org3]

That being said and to be honest, your use case is for sure not MongoEngine's standard use case and I wouldn't recommend to start building a large application relying only on this workaround/pattern.

I hope this helps

anekix commented 5 years ago

yes, i am aware of this workraround but as you mentioned this pattern is not enough to start modeling a huge application based on this.whats worse is we loose all the other goodness that mongoengine provides. will it need a major refactor of mongoengine as this is a common usecase for many SaaS applications?

bagerard commented 5 years ago

Good question. I'm afraid that it would be a complicated path to have all of mongoengine's feature working in a robust manner with such pattern. Additionally we haven't had that many demand for it and I don't think that django supports it so I really don't think its very common. And last but not least, we have limited development capacity right now and prefer putting effort into improving performance or bug fixing...

Palas-Anvaya commented 4 years ago

I am facing the same issue. For multi-tenant applications where there are varied sized customers its important that we have DBs/customer to prevent smaller one to suffer performance issue due to large customers.

Any new ideas how to come around this issue?

BenoitToulet commented 3 years ago

Hi Mongoengine users. I faced this issue myself, and I tried to build something that would work 'around' mongoengine, not modifying internal objects, in order to keep the library safe with all its features. It has been a long journey ! A bit of context here may be usefull: I want to use several databases, one per tenant, and I want to read/write my objects from all those DBs. I use inheritance for several objects, and for some services I have several threads. I noticed that both multithreading, and inheritance breaks the switch_db pattern: • No thread protection is implemented, so if two switch_db run in parallel, the last enter sets the db_alias for all objects. • If B is a child Document class of A, then querying A within a switch_db, getting result b that is a B instance, and saving b will save b in the default DB. That would mean querying for A within A and B switch_dbs, that is not intuitive, source of bugs, and clearly inheritance anti-pattern. I wanted something easy to use like: A = A[“db_alias”].objects.get(id=’1234’) Main difficulties: • Class registries have to be set and kept up-to-date for each database • ReferenceFields need to be defined dynamically • Inherited classes have to be defined dynamically

Here is a (so far) working solution, at least for our project:

utils.multidb_document.py


import itertools
from mongoengine import Document
from mongoengine.fields import ObjectIdField

class MultidbDocumentItemMeta(Document.my_metaclass):
    def __new__(cls, name, bases, attrs, owner):
        if not issubclass(owner, MultidbDocument):
            raise TypeError("owner shall be a subclass of MultidbDocument !")
        # verify the class is the root class, or an item class
        is_child = owner.get_root_document_class() is not None
        if is_child:
            base_list = list(bases) 
            # Remove MultidbDocumentClass as it is not needed there,
            # it will be added via owner.get_root_document_class()
            base_list.remove(MultidbDocumentClass) 
            # Need to inherit from the root class !
            # MRO: left to right, the root class is the last !
            base_list.append(owner.get_root_document_class())
            bases = tuple(base_list) 
        else:
            if "meta" not in attrs:
                attrs["meta"] = {}
            meta = attrs.get("meta")
            meta["abstract"] = True
            if not meta.get("id_field"):
                # We need to define now the id field
                # otherwise graphene will disallow id operations !
                id_name, id_db_name = cls.get_auto_id_names(attrs)
                attrs[id_name] = ObjectIdField(db_field=id_db_name)
                meta["id_field"] = id_name
        super_new = super(MultidbDocumentItemMeta, cls).__new__
        new_class = super_new(cls, name, bases, attrs)
        return new_class

    @classmethod
    def get_auto_id_names(mcs, attrs):
        """Find a name for the automatic ID field for the given new class.

        Return a two-element tuple where the first item is the field name (i.e.
        the attribute name on the object) and the second element is the DB
        field name (i.e. the name of the key stored in MongoDB).

        Defaults to ('id', '_id'), or generates a non-clashing name in the form
        of ('auto_id_X', '_auto_id_X') if the default name is already taken.
        """
        id_name, id_db_name = ("id", "_id")
        existing_fields = {field_name for field_name in attrs}
        if id_name not in existing_fields:
            return id_name, id_db_name

        id_basename, id_db_basename, i = ("auto_id", "_auto_id", 0)
        for i in itertools.count():
            id_name = "{}_{}".format(id_basename, i)
            id_db_name = "{}_{}".format(id_db_basename, i)
            if id_name not in existing_fields:
                return id_name, id_db_name

class MultidbDocumentMeta(type):

    def __getitem__(cls, name):
        if not hasattr(cls, name):
            DatabaseAliasesRegistry.register_alias(name)
            # should update self."x"
        return getattr(cls, name)

def init_multidb_document_subclass(*args, **kwargs):
    raise TypeError("Multidb document shall not be inherited from. It's a final class. Inheritance of documents are defined in create_document_class function")

class MultidbDocument(metaclass=MultidbDocumentMeta):
    my_metaclass = MultidbDocumentMeta
    _root_document_class = None
    _item_metaclass = None

    @classmethod
    def create_document_class(cls, db_alias: str):
        raise NotImplementedError("The multidb_document subclass {cls} needs to overload create_document_class(db_alias) method")

    @classmethod
    def get_root_document_class(cls):
        """
        This class must be used only for test check purpose, as all classes for databases inherits from it
        Be carefull, only for data structure check, eg graphene, no functional db use !
        This is a mongoengine 'abstract' document
        """
        return cls._root_document_class

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        # Create the root class from whom inherit !
        cls._root_document_class = cls.create_document_class('_root_document_class')
        if not issubclass(cls._root_document_class, MultidbDocumentClass):
            raise TypeError("create_document_class shall return a subclass of MultidbDocumentClass")
        DatabaseAliasesRegistry.suscribe_to_new_db_alias(cls._create_document_class)
        for db_alias in DatabaseAliasesRegistry.get_existing_db_aliases():
            cls._create_document_class(db_alias)
        cls.__init_subclass__ = init_multidb_document_subclass

    @classmethod
    def _create_document_class(cls, db_alias: str):
        if not hasattr(cls, db_alias):
            # create class dynamically
            #doc_class = cls.create_document_class(db_alias, cls._root_document_class, False)
            doc_class = cls.create_document_class(db_alias)
            # switch to db_alias, not usefull if meta has been set in class body
            doc_class._meta["db_alias"] = db_alias
            doc_class._collection = None
            setattr(cls, db_alias, doc_class)

class MultidbDocumentClass(Document, metaclass=MultidbDocumentItemMeta, owner=MultidbDocument):
    meta={
        "abstract": True
    }

class DatabaseAliasesRegistry:
    """
    Static class with only static methods
    This registry will help synchronization between databases
    Needed to avoid unknown derived classes
    """
    db_aliases = []
    new_db_alias_listeners = []

    @staticmethod
    def register_alias(db_alias:str):
        if db_alias not in DatabaseAliasesRegistry.db_aliases:
            DatabaseAliasesRegistry.db_aliases.append(db_alias)
            for listener in DatabaseAliasesRegistry.new_db_alias_listeners:
                listener(db_alias)

    @staticmethod
    def get_existing_db_aliases():
        return DatabaseAliasesRegistry.db_aliases

    @staticmethod
    def suscribe_to_new_db_alias(callback_function):
        DatabaseAliasesRegistry.new_db_alias_listeners.append(callback_function)

Usage:

from utils.multidb_document import MultidbDocument, MultidbDocumentClass
from mongoengine.fields import ReferenceField, IntField, StringField

class C(MultidbDocument): 
    @classmethod
    def create_document_class(cls, db_alias:str):
        class C(MultidbDocumentClass, owner=cls):
            meta = {
                 "collection": "c_collection"
            }
            name = StringField()

        return C

class A(MultidbDocument): 
    @classmethod
    def create_document_class(cls, db_alias:str):
        class A(MultidbDocumentClass, owner=cls):
            meta = {
                 "collection": "ab_collection"
            }
            # We use db_alias here !
            ReferenceField(C[db_alias])

        return A

class B(MultidbDocument):
    @classmethod
    def create_document_class(cls, db_alias:str):
        # MultidbDocumentClass may be ommited here...
        # But specified to be consistent
        # We also use db_alias here !
        class B(A[db_alias], MultidbDocumentClass, owner=cls):
            how_much = IntField()

        return B

Hope it helps.

frepond commented 3 years ago

Based on the ideas from @BenoitToulet and trying not to modify our existing codebase we came to this solution. So far is working but it's barely tested.

import threading
from typing import Any, Dict

from mongoengine.connection import DEFAULT_CONNECTION_NAME
from mongoengine.document import Document

REGISTRY: Dict[str, Any] = {}
LOCK = threading.Lock()

def ClassFactory(name, BaseClass=Document):
    def __init__(self, **kwargs):
        BaseClass.__init__(self, **kwargs)
        #  BaseClass.__init__(self, name[: -len("Class")])

    newclass = type(name, (BaseClass,), {"__init__": __init__})

    return newclass

class switch_db:
    """switch_db alias context manager.

    Example ::

        # Register connections
        register_connection('default', 'mongoenginetest')
        register_connection('testdb-1', 'mongoenginetest2')

        class Group(Document):
            name = StringField()

        Group(name='test').save()  # Saves in the default db

        with switch_db(Group, 'testdb-1') as Group:
            Group(name='hello testdb!').save()  # Saves in testdb-1
    """

    def __init__(self, cls, db_alias):
        """Construct the switch_db context manager

        :param cls: the class to change the registered db
        :param db_alias: the name of the specific database to use
        """
        new_cls_name = f"{db_alias}_{cls.__module__}.{cls.__name__}"

        with LOCK:
            new_cls = REGISTRY.get(new_cls_name, None)

            if not new_cls:
                allow_inheritance = cls._meta["allow_inheritance"]
                cls._meta["allow_inheritance"] = True
                new_cls = ClassFactory(new_cls_name, cls)
                cls._meta["allow_inheritance"] = allow_inheritance
                new_cls._meta["allow_inheritance"] = allow_inheritance

                REGISTRY[new_cls_name] = new_cls

        self.cls = new_cls
        self.collection = new_cls._get_collection()
        self.db_alias = db_alias
        self.ori_db_alias = new_cls._meta.get("db_alias", DEFAULT_CONNECTION_NAME)

    def __enter__(self):
        """Change the db_alias and clear the cached collection."""
        self.cls._meta["db_alias"] = self.db_alias
        self.cls._collection = None

        return self.cls

    def __exit__(self, t, value, traceback):
        """Reset the db_alias and collection."""
        self.cls._meta["db_alias"] = self.ori_db_alias
        self.cls._collection = self.collection
BenoitToulet commented 2 years ago

Hi Mongoengine users. I faced this issue myself, and I tried to build something that would work 'around' mongoengine, not modifying internal objects, in order to keep the library safe with all its features. It has been a long journey ! A bit of context here may be usefull: I want to use several databases, one per tenant, and I want to read/write my objects from all those DBs. I use inheritance for several objects, and for some services I have several threads. I noticed that both multithreading, and inheritance breaks the switch_db pattern: • No thread protection is implemented, so if two switch_db run in parallel, the last enter sets the db_alias for all objects. • If B is a child Document class of A, then querying A within a switch_db, getting result b that is a B instance, and saving b will save b in the default DB. That would mean querying for A within A and B switch_dbs, that is not intuitive, source of bugs, and clearly inheritance anti-pattern. I wanted something easy to use like: A = A[“db_alias”].objects.get(id=’1234’) Main difficulties: • Class registries have to be set and kept up-to-date for each database • ReferenceFields need to be defined dynamically • Inherited classes have to be defined dynamically

Here is a (so far) working solution, at least for our project:

utils.multidb_document.py


import itertools
from mongoengine import Document
from mongoengine.fields import ObjectIdField

class MultidbDocumentItemMeta(Document.my_metaclass):
    def __new__(cls, name, bases, attrs, owner):
        if not issubclass(owner, MultidbDocument):
            raise TypeError("owner shall be a subclass of MultidbDocument !")
        # verify the class is the root class, or an item class
        is_child = owner.get_root_document_class() is not None
        if is_child:
            base_list = list(bases) 
            # Remove MultidbDocumentClass as it is not needed there,
            # it will be added via owner.get_root_document_class()
            base_list.remove(MultidbDocumentClass) 
            # Need to inherit from the root class !
            # MRO: left to right, the root class is the last !
            base_list.append(owner.get_root_document_class())
            bases = tuple(base_list) 
        else:
            if "meta" not in attrs:
                attrs["meta"] = {}
            meta = attrs.get("meta")
            meta["abstract"] = True
            if not meta.get("id_field"):
                # We need to define now the id field
                # otherwise graphene will disallow id operations !
                id_name, id_db_name = cls.get_auto_id_names(attrs)
                attrs[id_name] = ObjectIdField(db_field=id_db_name)
                meta["id_field"] = id_name
        super_new = super(MultidbDocumentItemMeta, cls).__new__
        new_class = super_new(cls, name, bases, attrs)
        return new_class

    @classmethod
    def get_auto_id_names(mcs, attrs):
        """Find a name for the automatic ID field for the given new class.

        Return a two-element tuple where the first item is the field name (i.e.
        the attribute name on the object) and the second element is the DB
        field name (i.e. the name of the key stored in MongoDB).

        Defaults to ('id', '_id'), or generates a non-clashing name in the form
        of ('auto_id_X', '_auto_id_X') if the default name is already taken.
        """
        id_name, id_db_name = ("id", "_id")
        existing_fields = {field_name for field_name in attrs}
        if id_name not in existing_fields:
            return id_name, id_db_name

        id_basename, id_db_basename, i = ("auto_id", "_auto_id", 0)
        for i in itertools.count():
            id_name = "{}_{}".format(id_basename, i)
            id_db_name = "{}_{}".format(id_db_basename, i)
            if id_name not in existing_fields:
                return id_name, id_db_name

class MultidbDocumentMeta(type):

    def __getitem__(cls, name):
        if not hasattr(cls, name):
            DatabaseAliasesRegistry.register_alias(name)
            # should update self."x"
        return getattr(cls, name)

def init_multidb_document_subclass(*args, **kwargs):
    raise TypeError("Multidb document shall not be inherited from. It's a final class. Inheritance of documents are defined in create_document_class function")

class MultidbDocument(metaclass=MultidbDocumentMeta):
    my_metaclass = MultidbDocumentMeta
    _root_document_class = None
    _item_metaclass = None

    @classmethod
    def create_document_class(cls, db_alias: str):
        raise NotImplementedError("The multidb_document subclass {cls} needs to overload create_document_class(db_alias) method")

    @classmethod
    def get_root_document_class(cls):
        """
        This class must be used only for test check purpose, as all classes for databases inherits from it
        Be carefull, only for data structure check, eg graphene, no functional db use !
        This is a mongoengine 'abstract' document
        """
        return cls._root_document_class

    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        # Create the root class from whom inherit !
        cls._root_document_class = cls.create_document_class('_root_document_class')
        if not issubclass(cls._root_document_class, MultidbDocumentClass):
            raise TypeError("create_document_class shall return a subclass of MultidbDocumentClass")
        DatabaseAliasesRegistry.suscribe_to_new_db_alias(cls._create_document_class)
        for db_alias in DatabaseAliasesRegistry.get_existing_db_aliases():
            cls._create_document_class(db_alias)
        cls.__init_subclass__ = init_multidb_document_subclass

    @classmethod
    def _create_document_class(cls, db_alias: str):
        if not hasattr(cls, db_alias):
            # create class dynamically
            #doc_class = cls.create_document_class(db_alias, cls._root_document_class, False)
            doc_class = cls.create_document_class(db_alias)
            # switch to db_alias, not usefull if meta has been set in class body
            doc_class._meta["db_alias"] = db_alias
            doc_class._collection = None
            setattr(cls, db_alias, doc_class)

class MultidbDocumentClass(Document, metaclass=MultidbDocumentItemMeta, owner=MultidbDocument):
    meta={
        "abstract": True
    }

class DatabaseAliasesRegistry:
    """
    Static class with only static methods
    This registry will help synchronization between databases
    Needed to avoid unknown derived classes
    """
    db_aliases = []
    new_db_alias_listeners = []

    @staticmethod
    def register_alias(db_alias:str):
        if db_alias not in DatabaseAliasesRegistry.db_aliases:
            DatabaseAliasesRegistry.db_aliases.append(db_alias)
            for listener in DatabaseAliasesRegistry.new_db_alias_listeners:
                listener(db_alias)

    @staticmethod
    def get_existing_db_aliases():
        return DatabaseAliasesRegistry.db_aliases

    @staticmethod
    def suscribe_to_new_db_alias(callback_function):
        DatabaseAliasesRegistry.new_db_alias_listeners.append(callback_function)

Usage:

from utils.multidb_document import MultidbDocument, MultidbDocumentClass
from mongoengine.fields import ReferenceField, IntField, StringField

class C(MultidbDocument): 
    @classmethod
    def create_document_class(cls, db_alias:str):
        class C(MultidbDocumentClass, owner=cls):
            meta = {
                 "collection": "c_collection"
            }
            name = StringField()

        return C

class A(MultidbDocument): 
    @classmethod
    def create_document_class(cls, db_alias:str):
        class A(MultidbDocumentClass, owner=cls):
            meta = {
                 "collection": "ab_collection"
            }
            # We use db_alias here !
            ReferenceField(C[db_alias])

        return A

class B(MultidbDocument):
    @classmethod
    def create_document_class(cls, db_alias:str):
        # MultidbDocumentClass may be ommited here...
        # But specified to be consistent
        # We also use db_alias here !
        class B(A[db_alias], MultidbDocumentClass, owner=cls):
            how_much = IntField()

        return B

Hope it helps.

Hello mongoengine users.

A little update to fix an issue with inheritance pattern:


class MultidbDocumentClass(Document, metaclass=MultidbDocumentItemMeta, owner=MultidbDocument):
    meta = {"abstract": True}

    @classmethod
    def _from_son(cls, son, _auto_dereference=True, only_fields=None, created=False):
        """Need to overload this method when retreiving"""
        # Get the class name from the document, falling back to the given
        # class if unavailable
        class_name = son.get("_cls", cls._class_name)

        if class_name != cls._class_name:
            for sub_class in cls.__subclasses__():
                if sub_class._class_name == class_name:
                    return sub_class._from_son(son, _auto_dereference, only_fields, created)
        # Else: keep default behaviour
        return super()._from_son(son, _auto_dereference, only_fields, created)

This solution works fine for nearly one year on our side.