cundi / blog

push issues as a blog
2 stars 0 forks source link

理解Python中的线程[翻译] #27

Open cundi opened 8 years ago

cundi commented 8 years ago

原始链接:http://agiliq.com/blog/2013/09/understanding-threads-in-python/

Understanding Threads in Python

By : Akshar Raaj

We will see some examples of using threads in Python and how to avoid race conditions:

You should run each example several times to notice that threads are unpredictable and that your results differ every time.

Disclaimer: Forget anything you heard about GIL for now, because GIL is not going to mess up with scenarios I want to show.

Example 1:

We want to fetch five different urls:

Single threaded way:

def get_responses():
    urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com']
    start = time.time()
    for url in urls:
        print url
        resp = urllib2.urlopen(url)
        print resp.getcode()
    print "Elapsed time: %s" % (time.time()-start)

get_response()

Output is:

http://www.google.com 200
http://www.amazon.com 200
http://www.ebay.com 200
http://www.alibaba.com 200
http://www.reddit.com 200
Elapsed time: 3.0814409256

Explanation 解释:

按顺序获取所有的url。
除非由url获得相应,它不会取回接下来的URL。
网络才做是需要花些时间的,所以,处理在等待来自url的相应时出于空闲状态。

Even in a single threaded program, there is one thread of execution. Let's call it main thread. So, last example had only one thread, i.e main thread.

Multi threaded way:

You need to create a class which subclasses Thread:

from threading import Thread

class GetUrlThread(Thread):
    def __init__(self, url):
        self.url = url 
        super(GetUrlThread, self).__init__()

    def run(self):
        resp = urllib2.urlopen(self.url)
        print self.url, resp.getcode()

def get_responses():
    urls = ['http://www.google.com', 'http://www.amazon.com', 'http://www.ebay.com', 'http://www.alibaba.com', 'http://www.reddit.com']
    start = time.time()
    threads = []
    for url in urls:
        t = GetUrlThread(url)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    print "Elapsed time: %s" % (time.time()-start)

get_responses()

Output is:

http://www.reddit.com 200
http://www.google.com 200
http://www.amazon.com 200
http://www.alibaba.com 200
http://www.ebay.com 200
Elapsed time: 0.689890861511

Explanation:

We will demonstrate race condition with a program and then fix it:

Read wikipedia example to understand what race condition means.

#define a global variable
some_var = 0

class IncrementThread(Thread):
    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        read_value = some_var
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1 
        print "some_var in %s after increment is %d" % (self.name, some_var)

def use_increment_thread():
    threads = []
    for i in range(50):
        t = IncrementThread()
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    print "After 50 modifications, some_var should have become 50"
    print "After 50 modifications, some_var is %d" % (some_var,)

use_increment_thread()

Run this and you will find different result every time.

Explanation:

Change the run() of IncrementThread to:

from threading import Lock
lock = Lock()

class IncrementThread(Thread):
    def run(self):
        #we want to read a global variable
        #and then increment it
        global some_var
        lock.acquire()
        read_value = some_var
        print "some_var in %s is %d" % (self.name, read_value)
        some_var = read_value + 1
        print "some_var in %s after increment is %d" % (self.name, some_var)
        lock.release()

You should run use_increment_thread again and the result will match your expectation.

Explanation:

In last example we saw how a global variable gets affected in multithreading. Let's see an example to verify that one thread cannot affect the instance variable of some other thread.

Let's introduce time.sleep() in this example. It will make sure that a thread goes in suspended state and hence enforces thread switching to occur.

import time

class CreateListThread(Thread):
    def run(self):
        self.entries = []
        for i in range(10):
            time.sleep(1)
            self.entries.append(i)
        print self.entries

def use_create_list_thread():
    for i in range(3):
        t = CreateListThread()
        t.start()

use_create_list_thread()

Run it few times and you will notice that the list do not get printed properly.

Possibly the entries of one thread was getting printed and during this operation, processor switched to some other thread and started printing the entries for other thread. We want to ensure that entries get printed one after another for separate threads.

Change run() of CreateListThread to use lock.

class CreateListThread(Thread):
    def run(self):
        self.entries = []
        for i in range(10):
            time.sleep(1)
            self.entries.append(i)
        lock.acquire()
        print self.entries
        lock.release()

So, we put the print operation inside a lock. When one thread has acquired the lock and printing its entries, no other thread can print its entries. And so you will see entries of different threads printed on separate lines.

This will show that all threads' entries, which is an instance variable, is a list with numbers from 0 to 9. So, thread switching doesn't affect the instance variable of a thread.