aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
434 stars 188 forks source link

Formalize the usage of transactions between the front end and backend layers #2496

Open sphuber opened 5 years ago

sphuber commented 5 years ago

The usage of transactions in the ORM should be formalized in the sense of where they are defined, used and who bears the responsibility. Probably, the backend ORM layer should never explicitly open/control a transaction, because it will very often be a front end instance who will have to ask multiple independent backend instances to store themselves and the entire store should happen in a single transaction because the individual parts would be unintelligible.

For example, consider a Node instance. In the near future, the repository will be refactored to store a reference of the files the node contains in a separate database table. When storing a new node instance then, the front end class will have to instruct the backend node to store itself, after which it asks the repository backend instance to store a reference of its contents in the table. Only when both operations succeed should the changes be committed. To avoid a direct dependency between the backend node and backend repository interfaces, it is the front end node class that will have to take responsibility for creating a transaction and committing the changes.

giovannipizzi commented 5 years ago

A few comments here.

  1. In places like we use rollback: https://github.com/aiidateam/aiida_core/blob/b8eaffca3c448d1242eded845aa118b4bc1ae1a9/aiida/orm/implementation/sqlalchemy/nodes.py#L185-L197 I think this pattern is incorrect because even if another exception happens that has nothing to do with the DB, we roll back. The suggested approach here has an additional bit, that is having a .begin() before the block, so that you roll back only what you were doing in that bit of code (and actually uses except instead of except Exception, that in this specific case I think it's better since we anyway re-raise, and we would also properly catch things like CTRL+C. This is the link for django

  2. I have the feeling that the approach to follow is:

    • in the backend, provide functionality to manage transactions
    • provide a decorator that the user can use if they want to wrap (we should already have it I think)
    • in the backend, do not wrap any operation in transactions
    • in the frontend, where appropriate use transactions. Here there are multiple approaches, to be discussed. For instance, each method e.g. to save an extra, or an attribute, ... should wrap the call to the backend in a transaction. The frontend then should either provide API methods for bulk operations, or disable these transactions with a flag, or automatically if one is already inside a transaction. This needs a bit of thought, because on one side we don't want to have deeply nested sub-transactions that are very inefficient, but on the other side we need to ensure that if a problem happens in a set_attribute, the whole transaction is not rolled back if that is not what the user had in mind. On the other hand, if the user asks explicitly to run inside a transaction block, it's ok to roll back the whole transaction. We should probably come up with some examples, and then document carefully what can and cannot be done (like e.g. also Django does: when working within a transaction, there are things like catching and ignoring exceptions that you should not do. See this page, section "Avoid catching exceptions inside atomic!").
giovannipizzi commented 5 years ago

Some additional codes and examples follow to better understand how transactions work (at least with Django + PostgreSQL and the default AiiDA+Django+Postgres settings).

Case 1: how to trigger an error if multiple sessions are modifying the same row.

IMPORTANT: before running read the script, because it will mess up your data.

from __future__ import print_function

import random
import os
import string
import subprocess
import sys
from aiida.backends.djsite.db.models import DbNode
from django.db import transaction

# If True, tells the DB we are going to lock that row because we want to update its content
# I will then run with nowait=True, otherwise the called subprocess will hang
SELECT_FOR_UPDATE = False

if __name__ == "__main__":
    try:
        value = sys.argv[1]
        dont_iterate = True
        append = False
    except IndexError:
        value = "".join(string.lowercase[random.randint(0, len(string.lowercase)-1)] for _ in range(6))
        dont_iterate = False
        append = False
    prefix = ""
    if dont_iterate:
        prefix = "  "
    print(prefix, "="*50)
    print(prefix, "PROCESS PID:", os.getpid())
    with transaction.atomic():
        if SELECT_FOR_UPDATE:
            dbnode = DbNode.objects.select_for_update(nowait=True).get(pk=2)
        else:
            dbnode = DbNode.objects.get(pk=2)
        print(prefix, "before running:", dbnode.label)
        if append:
            dbnode.label = dbnode.label + value
        else:
            dbnode.label = value
        dbnode.extras['a'] = dbnode.label
        if dont_iterate:
            dbnode.extras['b'] = "set internally"
        print(prefix, "after setting:", dbnode.label)
        #return_value = subprocess.check_output([os.path.join(os.path.split(sys.executable)[0], 'verdi'), 'run', sys.argv[0], 'new_label'])
        if not dont_iterate:
            os.system(" ".join([os.path.join(os.path.split(sys.executable)[0], 'verdi'), 'run', sys.argv[0], 'new_label']))
        print(prefix, "CHECK1:", DbNode.objects.get(pk=2).label)
        dbnode.save()
        print(prefix, "CHECK2:", DbNode.objects.get(pk=2).label)

    print(prefix, "PK:    ", dbnode.pk)
    print(prefix, "LABEL: ", dbnode.label)
    print(prefix, "EXTRAS:", dbnode.extras)
    print(prefix, "FINALCHECK:", DbNode.objects.get(pk=2).label)
    print(prefix, "*"*50)

This script runs itself once trying to modify the same row. If you don't use SELECT_FOR_UPDATE, the modifications of the inner call are completely lost. If you use SELECT_FOR_UPDATE (and since we are using nowait=True) the inner call will fail when trying to save because there is a lock acquired.

Case 2: Integrity errors and rollbacks

from __future__ import print_function

import random
import os
import string
import subprocess
import sys
from aiida.backends.djsite.db.models import DbNode
from django.db import transaction
from django.db import IntegrityError
from django.db.transaction import TransactionManagementError

if __name__ == "__main__":
    prefix = ""
    print(prefix, "="*50)
    print(prefix, "PROCESS PID:", os.getpid())
    try:
        with transaction.atomic():
            dbnode = DbNode.objects.get(pk=2)
            dbnode.pk = None # So when I try to store it it should raise a uniqueness
                            # error because of the UUID
            # This will except with an error like
            # django.db.utils.IntegrityError: duplicate key value violates unique constraint "db_dbnode_uuid_62e0bf98_uniq"
            dbnode.save()
    except IntegrityError:
        print("Integrity error raised, ignoring")

    dbnode = DbNode.objects.get(pk=2)
    print("I got the node again (1):", dbnode)

    print('-'*50)

    with transaction.atomic():
        dbnode = DbNode.objects.get(pk=2)
        dbnode.pk = None # So when I try to store it it should raise a uniqueness
                        # error because of the UUID
        # This will except with an error like
        # django.db.utils.IntegrityError: duplicate key value violates unique constraint "db_dbnode_uuid_62e0bf98_uniq"
        try:
            dbnode.save()
        except IntegrityError:
            print("Integrity error raised, ignoring")

    # This will be ok; we went out ot the atomic() block
    dbnode = DbNode.objects.get(pk=2)
    print("I got the node again (2):", dbnode)

    print('-'*50)

    with transaction.atomic():
        dbnode = DbNode.objects.get(pk=2)
        dbnode.pk = None # So when I try to store it it should raise a uniqueness
                         # error because of the UUID
        # This will except with an error like
        # django.db.utils.IntegrityError: duplicate key value violates unique constraint "db_dbnode_uuid_62e0bf98_uniq"
        try:
            dbnode.save()
        except IntegrityError:
            print("Integrity error raised, ignoring")

        try:
            # This will now fail! We are still inside the atomic block and we didn't roll back
            dbnode = DbNode.objects.get(pk=2)
            print("I got the node again (3):", dbnode)
        except TransactionManagementError as exc:
            print("This exception is EXPECTED: {}".format(exc))

    print('-'*50)

    # No transaction here
    dbnode = DbNode.objects.get(pk=2)
    dbnode.pk = None # So when I try to store it it should raise a uniqueness
                        # error because of the UUID
    # This will except with an error like
    # django.db.utils.IntegrityError: duplicate key value violates unique constraint "db_dbnode_uuid_62e0bf98_uniq"
    try:
        dbnode.save()
    except IntegrityError:
        print("Integrity error raised, ignoring")

    # This will now work, we are in autocommit mode
    dbnode = DbNode.objects.get(pk=2)
    print("I got the node again (4):", dbnode)

    print(prefix, "*"*50)

Comments explain the behavior for the four different scenarios; the output looks like this:

 ==================================================
 PROCESS PID: 79774
Integrity error raised, ignoring
I got the node again (1): Data node [2]: rcsiet
--------------------------------------------------
Integrity error raised, ignoring
I got the node again (2): Data node [2]: rcsiet
--------------------------------------------------
Integrity error raised, ignoring
This exception is EXPECTED: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
--------------------------------------------------
Integrity error raised, ignoring
I got the node again (4): Data node [2]: rcsiet
 **************************************************
giovannipizzi commented 5 years ago

As a note, in #3367 we also fix one similar issue in aiida.backends.sqlachemy.utils.delete_nodes_and_connections, where we were creating a new session for deletion rather than doing it inside a transaction. There we are fixing the issue by using the aiida transaction context manager, but we should check if other parts of the code use the same pattern and fix them as well.