Replace @syncrhonized by pthread_mutex_lock / pthread_mutex_unlock

pabloroca commented 10 years ago

Wouldn't be faster in performance to replace this slow @synchronized sentences by a pthread_mutex_lock / pthread_mutex_unlock ?

jasperblues commented 10 years ago

Sounds good. Would you like to send a pull request?

alexgarbarev commented 10 years ago

Where you want to replace it? Why you want replace it? Is it slow down your application? As I know @synchronized is recurcive lock based on pthread_mutex_t + exception handling (releasing mutex when exception occured).

I don't think that it significantly improves performance. I think we have another places for performance improvements. So.. have you problem with performnace? In which method?

alexgarbarev commented 10 years ago

But if you have ideas for performance improvements, we will happy to see your pull request.

pabloroca commented 10 years ago

I am a performance freak :) and @synchronized is not the most efficient lock. For creating that evil Singletons, now is widely accepted (Apple recommends also) to move to dispath_once instead the @synchronized.

For enhancing performance, pthread locks perform better than @synchronized, also GCD sync dispatch queues perform better or GCD custom concurrent queue and barriers

For real scenario in an App using @synchronized would not impact too much in performance, but I prefer to use the other methods outlined. But using a third part component like Typhon as a core feature in your App .. and if you need to create many many objects this could lead in some overall slowness. So I think that core third part components must use, as a general practice, the best performance code.

I didn't dig in every line of code from Typhon. Just my two cents.

jasperblues commented 10 years ago

@pabloroca, thanks for your input :)

Typhoon has (from memory) just one evil singleton, which is created using dispatch_once. This has been standard practice for a while now. . .

As for the @synchronized lock: It is used to ensure Typhoon behaves correctly when resolving components with TyphoonScopeObjectGraph or TyphoonScopeWeakSingleton, from concurrent threads . . actually I forget the details now, but it was tricky at the time!

@alexgarbarev Is usually the one to do performance tuning, and according to his analysis it doesn't represent a significant overhead. That said, if you could show us a pull request with all tests passing and some profiling results it would certainly be interesting. . . Note that the method in question is called (very) recursively.

As @alexgarbarev says, if you're interested in helping improve performance that its something that is on the priority list. . . there's some additional ideas that have been tabled on how we could go about this.

alexgarbarev commented 10 years ago

Can you explain why exactly @synchronized is slow? re: dispatch_queue - yes, it's good to create queue for synchronization, and run all code inside serial queue, but I didn't see value for that in typhoon.
I think that advantage of dispatch_queue approach is that calling thread can not be locked for writing values. only add setting value block into queue. But for reading value, calling thread will wait until reading block is performed example:

- (void)setValue:(id)value
    dispatch_queue_async(queue, ^{
        _value = value;
    });

- (id)value
{
    __block id result = nil;
    dispatch_queue_sync(queue, ^{
        result = _value;
    });
    return result
}

are you follow? But to be sure, I need profiling result. I heard that adding block to queue is very fast, but block creation can be slow. block is NSObject, so created in the heap (or moved to the heap, when adding to queue), so there is performance problem - heap allocation each task.

alexgarbarev commented 10 years ago

As I know synchronized works like: 0) try to get existing mutex for passed object 1) if mutex not found create pthread_mutex_t with PTHREAD_MUTEX_RECURSIVE attribute 2) save created mutex for object passed in syncrhonized statement (some kind of hashtable object : mutex) 3) lock mutex 4) run passed code 5) catch any exceptions, and release mutex if exception occured 6) unlock mutex

So there is two overheads: 1) recursive mutex 2) catching exception

but I believe we need these both overheads in typhoon

pabloroca commented 10 years ago

Hi,

I'm not trying to push Alex neither you, I just saw that you use @synchronized in many parts in the code. If you search all over internet there is a common voice that says thet @synchronized is the slowest way set a lock.

@synchronized has the two overheads that Alex pointed, the fact that pulls in the whole exception framework for an app is very ugly to me.

Apple says in his Concurrency Programming Guide, section Eliminating Lock-Based Code (sic)

"For threaded code, locks are one of the traditional ways to synchronize access to resources that are shared between threads. However, the use of locks comes at a cost. Even in the non-contested case, there is always a performance penalty associated with taking a lock. And in the contested case, there is the potential for one or more threads to block for an indeterminate amount of time while waiting for the lock to be released

Replacing your lock-based code with queues eliminates many of the penalties associated with locks and also simplifies your remaining code. Instead of using a lock to protect a shared resource, you can instead create a queue to serialize the tasks that access that resource. Queues do not impose the same penalties as locks. For example, queueing a task does not require trapping into the kernel to acquire a mutex

Avoid using locks. The support provided by dispatch queues and operation queues makes locks unnecessary in most situations. Instead of using locks to protect some shared resource, designate a serial queue (or use operation object dependencies) to execute tasks in the correct order."

So, I take that clearly Apple doens't recommend now the @synchronized

1st approach (Posix Thread Locks)

What if to change to pthread_mutex_lock? Google also doesn't like the @synchronized approach, see this old article

http://googlemac.blogspot.co.uk/2006/10/synchronized-swimming.html

static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;

// LOCK  
pthread_mutex_lock(&mut);
(code)
// UNLOCK 
pthread_mutex_unlock(&mut);

This is against the current Apple recomendations to move to queues, anyhow pthread_mutex_lock is said to perform better than @synchronized

2nd approach (GCD FIFO sync queues)

As Ed comments here:

http://www.fieryrobot.com/blog/2010/09/01/synchronization-using-grand-central-dispatch/comment-page-1

We can use this code:

- (void)setValue:(id)value
    dispatch_queue_sync(queue, ^{
        _value = value;
    });

- (id)value
{
    __block result = nil;
    dispatch_queue_sync(queue, ^{
        result = value;
    });
    return result
}

dispatch_sync() is a convenient wrapper around dispatch_async() with the addition of a semaphore to wait for completion of the block, and a wrapper around the block to signal its completion.

perhaps this could have deadlock issues.

3rd approach (GCD FIFO async queues)

Mike Ash comments how to use it in his article:

https://www.mikeash.com/pyblog/friday-qa-2011-10-14-whats-new-in-gcd.html

Using a custom concurrent queue and dispatch_barrier_async. So the code can be read like:

queue = dispatch_queue_create("com.typhoon.cachequeue", DISPATCH_QUEUE_CONCURRENT);

- (void)setValue:(id)value
    dispatch_barrier_async(queue, ^{
        _value = value;
    });

- (id)value
{
    __block result = nil;
    dispatch_queue_sync(queue, ^{
        result = value;
    });
    return result
}

In the past I was using the posix thread locks and now I'm trying to move to GCD queues. But not sure yet if to use 2nd or 3rd approach. I'm more to use the 2nd approach cause it simulates the lock. Anyhow I need to benchmark this. All the comments in internet say that this performs better .. but how much better? I didn't saw code from that benchmarks.

Further read:

http://stackoverflow.com/questions/11923244/synchronized-vs-gcd-dispatch-barrier-async http://www.objc.io/issue-2/thread-safe-class-design.html

alexgarbarev commented 10 years ago

Thanks for information. I read these articles before, and tried all of these approaches in my https://github.com/alexgarbarev/ThumbnailService ( see - (void) synchronize:(dispatch_block_t)block method) and I notice some problems in concurrent code with 2nd and 3d approaches with GCD: when using queues from dispatch_queue_create - it creates new threads. When using global queues - it also creates new threads. When using dispatch_set_target and use global queues as target for private queues - it also create new threads. So, for example, if I had 20 objects which want to synchronize their access to their properties- creating new 20 threads is not great solution. mutex will be cheaper.

alexgarbarev commented 10 years ago

Hi @pabloroca I did few measurment and I was a bit surprised. Here is the code, I tested

@implementation ViewController {
    int a;
    dispatch_queue_t serialQueue;
    dispatch_queue_t concurrentQueue;
    pthread_mutex_t mutex;
    pthread_mutex_t recursive_mutex;
    NSLock *lock;
    NSRecursiveLock *recursiveLock;
    char *buf;
}

- (void)viewDidLoad
{

    [self init_lock];
    [self init_recursive_lock];
    [self init_serial_queue];
    [self init_concurrent_queue];
    [self init_syncrhonized];
    [self init_mutex];
    [self init_recursive_mutex];

    for (int i = 0; i < 10000000; i++) {

        [self test_lock:1];
        [self test_recursive_lock:1];
        [self test_synchronized:1];
        [self test_serial_queue:1];
        [self test_barrier_concurrent_queue:1];
        [self test_pthread_mutex:1];
        [self test_pthread_recursive_mutex:1];

    }

    [super viewDidLoad];
}

- (void)init_lock
{
    lock = [NSLock new];
}

- (void)init_recursive_lock
{
    recursiveLock = [NSRecursiveLock new];
}

- (void)init_syncrhonized
{
    @synchronized(self) {}
}

- (void)init_mutex
{
    pthread_mutex_destroy(&mutex);
    pthread_mutex_init(&mutex, NULL);
}

- (void)init_recursive_mutex
{
    pthread_mutexattr_t attr;
    pthread_mutexattr_init(&attr);
    pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
    pthread_mutex_init(&recursive_mutex, &attr);
}

- (void)init_serial_queue
{
    serialQueue = dispatch_queue_create("serial_queue", DISPATCH_QUEUE_SERIAL);
}

- (void)init_concurrent_queue
{
    concurrentQueue = dispatch_queue_create("concurrent_queue", DISPATCH_QUEUE_CONCURRENT);
}

- (void)test_synchronized:(int)b
{
    @synchronized(self) {
        a = b;
    }
}

- (void)test_serial_queue:(int)b
{
    dispatch_sync(serialQueue, ^{
        a = b;
    });
}

- (void)test_barrier_concurrent_queue:(int)b
{
    dispatch_barrier_sync(concurrentQueue, ^{
        a = b;
    });
}

- (void)test_pthread_mutex:(int)b
{
    pthread_mutex_lock(&mutex);
    a = b;
    pthread_mutex_unlock(&mutex);
}

- (void)test_pthread_recursive_mutex:(int)b
{
    pthread_mutex_lock(&recursive_mutex);
    a = b;
    pthread_mutex_unlock(&recursive_mutex);
}

- (void)test_lock:(int)b
{
    [lock lock];
    a = b;
    [lock unlock];
}

- (void)test_recursive_lock:(int)b
{
    [recursiveLock lock];
    a = b;
    [recursiveLock unlock];
}

@end

Here is my results. screenshot 2014-08-25 23 43 09 screenshot 2014-08-25 23 43 12 Seems like GCD approaches works slowly than synchronized. I test on simulator (will try on device soon)

alexgarbarev commented 10 years ago

Here is test results, from my iPhone 3GS:

screenshot 2014-08-25 23 52 26 screenshot 2014-08-25 23 52 39

pabloroca commented 10 years ago

wow!! Many thanks Alex. I could not state which one could work better cause I didn't run benchmarks. I just was asking to a better approach to @synchronized cause I saw many complains.

Very surprising yes. Why the h*ll Apple recommends GCD queues if they perform slower? agg

Anyhow .. seems that pthread_recursive_mutex performs 3 times better than @synchronized :)

pabloroca commented 10 years ago

Question .. where you using XCode 5 or XCode 6?

Any special project optimization settings?

A copy from the project will be welcome :) pablorocarREMOVE at gmail . com

alexgarbarev commented 10 years ago

Yes. I have to digg into assembly code to see how @synchronized works. I tested on XCode 5. Default "release" build configuration. Will send project soon

pabloroca commented 10 years ago

Many thanks Alex, this is a real demonstration, not the blah blah you can read in Internet and Apple Documentation.

Just out of curiosity, with this results .. are you thinking in changing @synchronized call to the pthread_recursive_mutex?

alexgarbarev commented 10 years ago

Maybe I something missed, but I didn't see exception-related function calls in assembly code.. Here is test_syncrhonized method: screenshot 2014-08-26 00 19 19 I see that my code arounded only by objc_sync_enter and objc_sync_exit. There is objc_sync_enter function screenshot 2014-08-26 00 11 47 and objc_sync_exit screenshot 2014-08-26 00 12 30

alexgarbarev commented 10 years ago

I found some sources from apple with these functions from above: http://www.opensource.apple.com/source/objc4/objc4-493.9/runtime/objc-sync.m

// Begin synchronizing on 'obj'. 
// Allocates recursive mutex associated with 'obj' if needed.
// Returns OBJC_SYNC_SUCCESS once lock is acquired.  
int objc_sync_enter(id obj)
{
    int result = OBJC_SYNC_SUCCESS;

    if (obj) {
        SyncData* data = id2data(obj, ACQUIRE);
        require_action_string(data != NULL, done, result = OBJC_SYNC_NOT_INITIALIZED, "id2data failed");

        result = recursive_mutex_lock(&data->mutex);
        require_noerr_string(result, done, "mutex_lock failed");
    } else {
        // @synchronized(nil) does nothing
        if (DebugNilSync) {
            _objc_inform("NIL SYNC DEBUG: @synchronized(nil); set a breakpoint on objc_sync_nil to debug");
        }
        objc_sync_nil();
    }

done: 
    return result;
}

// End synchronizing on 'obj'. 
// Returns OBJC_SYNC_SUCCESS or OBJC_SYNC_NOT_OWNING_THREAD_ERROR
int objc_sync_exit(id obj)
{
    int result = OBJC_SYNC_SUCCESS;

    if (obj) {
        SyncData* data = id2data(obj, RELEASE); 
        require_action_string(data != NULL, done, result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR, "id2data failed");

        result = recursive_mutex_unlock(&data->mutex);
        require_noerr_string(result, done, "mutex_unlock failed");
    } else {
        // @synchronized(nil) does nothing
    }

done:
    if ( result == RECURSIVE_MUTEX_NOT_LOCKED )
         result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR;

    return result;
}

alexgarbarev commented 10 years ago

I didnt see anything related to exception handling

pabloroca commented 10 years ago

Yes, they are: objc_sync_enter and objc_sync_exit, but not many doc out there. You can use it as:

objc_sync_enter(self); (code) objc_sync_exit(self);

alexgarbarev commented 10 years ago

Just out of curiosity, with this results .. are you thinking in changing @synchronized call to the pthread_recursive_mutex?

I think that @synchronized looks more elegant - no chance to forget 'unlock', because whole statement inside braces. I think It's more important than micro optimization.

Anyway I know more serious things to optimize in Typhoon (some object creations can be replaced by 'pool of object' pattern to reduce heap memory allocations), so it's more important to start here.

pabloroca commented 10 years ago

Cool, many thanks Alex

alexgarbarev commented 10 years ago

Thanks for discussion Pablo. I was surprised too

pabloroca commented 10 years ago

Just for confirming .. XCode 6 Beta 6 using iOS8 simulator, produces the same results.

pabloroca commented 10 years ago

Finally after running more tests and thinking. I'll move my things to NSRecursiveLock, in this way:

- (void)test_recursive_lock:(int)b
{
    static NSRecursiveLock *recursiveLock;
    if (recursiveLock == nil)
    {
        recursiveLock = [[NSRecursiveLock alloc] init];
    }
    [recursiveLock lock];
    a = b;
    [recursiveLock unlock];
}

Even is more verbose than @synchronized, it performs faster and the method has everything about the lock inside it (does not required a previous init) so this offers better testing

alexgarbarev commented 10 years ago

But is uses static NSRecursiveLock *recursiveLock; so, that will be shared between objects. Why?

alexgarbarev commented 10 years ago

and how do you handle this case

- (void)test_recursive_lock:(int)b
{
    static NSRecursiveLock *recursiveLock;
    if (recursiveLock == nil)
    {
        recursiveLock = [[NSRecursiveLock alloc] init];
    }
    [recursiveLock lock];
    a = b;
    [recursiveLock unlock];
}
- (void)some_another_place
{
    //we have to lock access here
    a = c;
}

pabloroca commented 10 years ago

The static is just shared only by the test_recursive_lock method, the scope for the static is just for the method.

If you do:

- (void)test_recursive_lock:(int)b
{
    static NSRecursiveLock *recursiveLock;
    if (recursiveLock == nil)
    {
        recursiveLock = [[NSRecursiveLock alloc] init];
    }
    [recursiveLock lock];

    a = b;

    [recursiveLock unlock];
}

- (void)test_recursive_lockAtoC:(int)c
{
    static NSRecursiveLock *recursiveLock;
    if (recursiveLock == nil)
    {
        recursiveLock = [[NSRecursiveLock alloc] init];
    }
    [recursiveLock lock];
    a = c;
    [recursiveLock unlock];
}

The recursiveLock object are different objects, so they are not shared

alexgarbarev commented 10 years ago

Problems: 1) If test_recursive_lock and test_recursive_lockAtoC called at same time (from two different threads), then a = b and a = c will be set at same time. I.e. locking will not works 2) if you create two objects (ViewController in our example), recursiveLock will be same inside test_recursive_lock method i.e. shared between objects

alexgarbarev commented 10 years ago

I think this discussion out of scope and not related to Typhoon

pabloroca commented 10 years ago

Yep .. sorry

jasperblues commented 10 years ago

Great work folks. @pabloroca, thanks for your input.

appsquickly / typhoon

Replace @syncrhonized by pthread_mutex_lock / pthread_mutex_unlock #246