In this blog post, we'll discuss thread-safety, why it's important and how to write thread-safe django code, especially for bulk operations like management commands. We'll take a simple example - get or create.
Thread-safety means that our code can be run in multiple threads and behave as expected. The reason that code can be unsafe with regard to threads is because we'll be manipulating shared memory (e.g. database) from the threads and there's a chance of a race-condition which will produce unexpected results.
To avoid this, we have the option of using read-write locks, transactions etc.
We'll look at some simple examples and try to understand these options.
The usual way:
Let's consider a management command that syncs data from another source (e.g. API, remote database etc.. The correct way to do this would be to use the built-in django utility - get_or_create:
Update: Updated the command to run each arg in a thread
class MyThread(Thread):
def __init__(self, my_id):
super(MyThread, self).__init__(name=my_id)
self.my_id = my_id
def run(self):
instance, created = MyModel.objects.get_or_create(my_id=my_id)
print '%s %s' % (instance.id, created)
instance.delete()
return
class Command(BaseCommand):
args = '<my_id my_id ...>'
help = 'Get or create instace of mymodel with my_id'
def handle(self, *args, **options):
for my_id in args:
thread = MyThread(my_id=my_id)
thread.start()
In this command, we'll be using the command line arg my_id get a MyModel instance if it exists, else, we create one.
Note: we'll be discarding the object for simplicity.
Now, this management command (let's call it sync_myapp) is thread-safe because we're just using the built-in get_or_create to do our database call.
To test this, run the command with same arg repeated multiple times:
$> python manage.py sync_myapp 1 1 1 1 1
This will create five threads which will simultaneously try to get or create an instance with my_id = 1
$> 1 True
$> 1 False
$> 1 False
$> 1 False
Note that all the threads are successful, even though they were working on the same database at the same time.
Now, even though get_or_create is the way to go, we might need to customize a few things which are outside the scope of get_or_create. For example, let's say we need to do something special just before creating a new instance.
The problem:
Let's assume our code was:
def run(self):
created = False
try:
instance = MyModel.objects.get(my_id=my_id)
except MyModel.DoesNotExist:
something_special()
instance = MyModel.objects.create(my_id=my_id)
created = True
instance.delete()
return
If we try to test it with the above command, you'll probably get:
This indicates that the try except block isn't thread-safe.
Let's try to fix this problem.
Update: I've removed the section dealing with locks since it's only useful when dealing with shared memory in python processes, it's not applicable to databases.
Using database transactions:
We use django's built-in transactions module as follows:
Update: We only need to wrap the create command in a transaction
Now, the transaction will be committed only if there's no error within the context block, so we can be sure that only one thread gets the go-ahead for create call.
In addition to the context manager, django also has options for using savepoints, manually commits, rollbacks etc.
原始链接:http://agiliq.com/blog/2013/08/writing-thread-safe-django-code/
Writing thread-safe django - get_or_create
By : Javed Khan
In this blog post, we'll discuss thread-safety, why it's important and how to write thread-safe django code, especially for bulk operations like management commands. We'll take a simple example - get or create.
在本篇博文中,我们将会讨论线程安全,为什么它非常重要,以及如何在Django中编写线程安全的代码,特别是像管理命令这样的常见操作。我们会举一个简单的例子-get和create。
Thread-safety 线程安全:
Thread-safety means that our code can be run in multiple threads and behave as expected. The reason that code can be unsafe with regard to threads is because we'll be manipulating shared memory (e.g. database) from the threads and there's a chance of a race-condition which will produce unexpected results.
To avoid this, we have the option of using read-write locks, transactions etc.
We'll look at some simple examples and try to understand these options.
The usual way:
Let's consider a management command that syncs data from another source (e.g. API, remote database etc.. The correct way to do this would be to use the built-in django utility -
get_or_create
:Update: Updated the command to run each arg in a thread
In this command, we'll be using the command line arg my_id get a MyModel instance if it exists, else, we create one.
Note: we'll be discarding the object for simplicity.
Now, this management command (let's call it sync_myapp) is thread-safe because we're just using the built-in get_or_create to do our database call.
To test this, run the command with same arg repeated multiple times:
This will create five threads which will simultaneously try to get or create an instance with my_id = 1
Note that all the threads are successful, even though they were working on the same database at the same time.
Now, even though
get_or_create
is the way to go, we might need to customize a few things which are outside the scope ofget_or_create
. For example, let's say we need to do something special just before creating a new instance.The problem:
Let's assume our code was:
If we try to test it with the above command, you'll probably get:
This indicates that the try except block isn't thread-safe.
Let's try to fix this problem.
Update: I've removed the section dealing with locks since it's only useful when dealing with shared memory in python processes, it's not applicable to databases.
Using database transactions:
We use django's built-in transactions module as follows:
Update: We only need to wrap the create command in a transaction
Now, the transaction will be committed only if there's no error within the context block, so we can be sure that only one thread gets the go-ahead for create call.
In addition to the context manager, django also has options for using savepoints, manually commits, rollbacks etc.
https://docs.djangoproject.com/en/1.5/topics/db/transactions
Caveats:
If you're using MySQL, refer to this open issue on problem with get_or_create:
https://code.djangoproject.com/ticket/13906
Conclusion 结论:
Using database transactions, we can avoid data integrity issues and write thread-safe code which can be run in parallel without any issues.
使用数据库事务,我们能够避免数据完整性方面的问题,同时也能够编写并行运行的线程安全代码而不悔出现任何问题。