Closed pabloroca closed 10 years ago
Sounds good. Would you like to send a pull request?
Where you want to replace it? Why you want replace it? Is it slow down your application? As I know @synchronized is recurcive lock based on pthread_mutex_t + exception handling (releasing mutex when exception occured).
I don't think that it significantly improves performance. I think we have another places for performance improvements. So.. have you problem with performnace? In which method?
But if you have ideas for performance improvements, we will happy to see your pull request.
I am a performance freak :) and @synchronized is not the most efficient lock. For creating that evil Singletons, now is widely accepted (Apple recommends also) to move to dispath_once instead the @synchronized.
For enhancing performance, pthread locks perform better than @synchronized, also GCD sync dispatch queues perform better or GCD custom concurrent queue and barriers
For real scenario in an App using @synchronized would not impact too much in performance, but I prefer to use the other methods outlined. But using a third part component like Typhon as a core feature in your App .. and if you need to create many many objects this could lead in some overall slowness. So I think that core third part components must use, as a general practice, the best performance code.
I didn't dig in every line of code from Typhon. Just my two cents.
@pabloroca, thanks for your input :)
Typhoon has (from memory) just one evil singleton, which is created using dispatch_once. This has been standard practice for a while now. . .
As for the @synchronized lock: It is used to ensure Typhoon behaves correctly when resolving components with TyphoonScopeObjectGraph or TyphoonScopeWeakSingleton, from concurrent threads . . actually I forget the details now, but it was tricky at the time!
@alexgarbarev Is usually the one to do performance tuning, and according to his analysis it doesn't represent a significant overhead. That said, if you could show us a pull request with all tests passing and some profiling results it would certainly be interesting. . . Note that the method in question is called (very) recursively.
As @alexgarbarev says, if you're interested in helping improve performance that its something that is on the priority list. . . there's some additional ideas that have been tabled on how we could go about this.
Can you explain why exactly @synchronized is slow?
re: dispatch_queue - yes, it's good to create queue for synchronization, and run all code inside serial queue, but I didn't see value for that in typhoon.
I think that advantage of dispatch_queue approach is that calling thread can not be locked for writing values. only add setting value block into queue. But for reading value, calling thread will wait until reading block is performed
example:
- (void)setValue:(id)value
dispatch_queue_async(queue, ^{
_value = value;
});
- (id)value
{
__block id result = nil;
dispatch_queue_sync(queue, ^{
result = _value;
});
return result
}
are you follow? But to be sure, I need profiling result. I heard that adding block to queue is very fast, but block creation can be slow. block is NSObject, so created in the heap (or moved to the heap, when adding to queue), so there is performance problem - heap allocation each task.
As I know synchronized works like: 0) try to get existing mutex for passed object 1) if mutex not found create pthread_mutex_t with PTHREAD_MUTEX_RECURSIVE attribute 2) save created mutex for object passed in syncrhonized statement (some kind of hashtable object : mutex) 3) lock mutex 4) run passed code 5) catch any exceptions, and release mutex if exception occured 6) unlock mutex
So there is two overheads: 1) recursive mutex 2) catching exception
but I believe we need these both overheads in typhoon
Hi,
I'm not trying to push Alex neither you, I just saw that you use @synchronized in many parts in the code. If you search all over internet there is a common voice that says thet @synchronized is the slowest way set a lock.
@synchronized has the two overheads that Alex pointed, the fact that pulls in the whole exception framework for an app is very ugly to me.
Apple says in his Concurrency Programming Guide, section Eliminating Lock-Based Code (sic)
"For threaded code, locks are one of the traditional ways to synchronize access to resources that are shared between threads. However, the use of locks comes at a cost. Even in the non-contested case, there is always a performance penalty associated with taking a lock. And in the contested case, there is the potential for one or more threads to block for an indeterminate amount of time while waiting for the lock to be released
Replacing your lock-based code with queues eliminates many of the penalties associated with locks and also simplifies your remaining code. Instead of using a lock to protect a shared resource, you can instead create a queue to serialize the tasks that access that resource. Queues do not impose the same penalties as locks. For example, queueing a task does not require trapping into the kernel to acquire a mutex
Avoid using locks. The support provided by dispatch queues and operation queues makes locks unnecessary in most situations. Instead of using locks to protect some shared resource, designate a serial queue (or use operation object dependencies) to execute tasks in the correct order."
So, I take that clearly Apple doens't recommend now the @synchronized
1st approach (Posix Thread Locks)
What if to change to pthread_mutex_lock? Google also doesn't like the @synchronized approach, see this old article
http://googlemac.blogspot.co.uk/2006/10/synchronized-swimming.html
static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
// LOCK
pthread_mutex_lock(&mut);
(code)
// UNLOCK
pthread_mutex_unlock(&mut);
This is against the current Apple recomendations to move to queues, anyhow pthread_mutex_lock is said to perform better than @synchronized
2nd approach (GCD FIFO sync queues)
As Ed comments here:
We can use this code:
- (void)setValue:(id)value
dispatch_queue_sync(queue, ^{
_value = value;
});
- (id)value
{
__block result = nil;
dispatch_queue_sync(queue, ^{
result = value;
});
return result
}
dispatch_sync() is a convenient wrapper around dispatch_async() with the addition of a semaphore to wait for completion of the block, and a wrapper around the block to signal its completion.
perhaps this could have deadlock issues.
3rd approach (GCD FIFO async queues)
Mike Ash comments how to use it in his article:
https://www.mikeash.com/pyblog/friday-qa-2011-10-14-whats-new-in-gcd.html
Using a custom concurrent queue and dispatch_barrier_async. So the code can be read like:
queue = dispatch_queue_create("com.typhoon.cachequeue", DISPATCH_QUEUE_CONCURRENT);
- (void)setValue:(id)value
dispatch_barrier_async(queue, ^{
_value = value;
});
- (id)value
{
__block result = nil;
dispatch_queue_sync(queue, ^{
result = value;
});
return result
}
In the past I was using the posix thread locks and now I'm trying to move to GCD queues. But not sure yet if to use 2nd or 3rd approach. I'm more to use the 2nd approach cause it simulates the lock. Anyhow I need to benchmark this. All the comments in internet say that this performs better .. but how much better? I didn't saw code from that benchmarks.
Further read:
http://stackoverflow.com/questions/11923244/synchronized-vs-gcd-dispatch-barrier-async http://www.objc.io/issue-2/thread-safe-class-design.html
Thanks for information. I read these articles before, and tried all of these approaches in my https://github.com/alexgarbarev/ThumbnailService ( see - (void) synchronize:(dispatch_block_t)block method) and I notice some problems in concurrent code with 2nd and 3d approaches with GCD: when using queues from dispatch_queue_create - it creates new threads. When using global queues - it also creates new threads. When using dispatch_set_target and use global queues as target for private queues - it also create new threads. So, for example, if I had 20 objects which want to synchronize their access to their properties- creating new 20 threads is not great solution. mutex will be cheaper.
Hi @pabloroca I did few measurment and I was a bit surprised. Here is the code, I tested
@implementation ViewController {
int a;
dispatch_queue_t serialQueue;
dispatch_queue_t concurrentQueue;
pthread_mutex_t mutex;
pthread_mutex_t recursive_mutex;
NSLock *lock;
NSRecursiveLock *recursiveLock;
char *buf;
}
- (void)viewDidLoad
{
[self init_lock];
[self init_recursive_lock];
[self init_serial_queue];
[self init_concurrent_queue];
[self init_syncrhonized];
[self init_mutex];
[self init_recursive_mutex];
for (int i = 0; i < 10000000; i++) {
[self test_lock:1];
[self test_recursive_lock:1];
[self test_synchronized:1];
[self test_serial_queue:1];
[self test_barrier_concurrent_queue:1];
[self test_pthread_mutex:1];
[self test_pthread_recursive_mutex:1];
}
[super viewDidLoad];
}
- (void)init_lock
{
lock = [NSLock new];
}
- (void)init_recursive_lock
{
recursiveLock = [NSRecursiveLock new];
}
- (void)init_syncrhonized
{
@synchronized(self) {}
}
- (void)init_mutex
{
pthread_mutex_destroy(&mutex);
pthread_mutex_init(&mutex, NULL);
}
- (void)init_recursive_mutex
{
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
pthread_mutex_init(&recursive_mutex, &attr);
}
- (void)init_serial_queue
{
serialQueue = dispatch_queue_create("serial_queue", DISPATCH_QUEUE_SERIAL);
}
- (void)init_concurrent_queue
{
concurrentQueue = dispatch_queue_create("concurrent_queue", DISPATCH_QUEUE_CONCURRENT);
}
- (void)test_synchronized:(int)b
{
@synchronized(self) {
a = b;
}
}
- (void)test_serial_queue:(int)b
{
dispatch_sync(serialQueue, ^{
a = b;
});
}
- (void)test_barrier_concurrent_queue:(int)b
{
dispatch_barrier_sync(concurrentQueue, ^{
a = b;
});
}
- (void)test_pthread_mutex:(int)b
{
pthread_mutex_lock(&mutex);
a = b;
pthread_mutex_unlock(&mutex);
}
- (void)test_pthread_recursive_mutex:(int)b
{
pthread_mutex_lock(&recursive_mutex);
a = b;
pthread_mutex_unlock(&recursive_mutex);
}
- (void)test_lock:(int)b
{
[lock lock];
a = b;
[lock unlock];
}
- (void)test_recursive_lock:(int)b
{
[recursiveLock lock];
a = b;
[recursiveLock unlock];
}
@end
Here is my results. Seems like GCD approaches works slowly than synchronized. I test on simulator (will try on device soon)
Here is test results, from my iPhone 3GS:
wow!! Many thanks Alex. I could not state which one could work better cause I didn't run benchmarks. I just was asking to a better approach to @synchronized cause I saw many complains.
Very surprising yes. Why the h*ll Apple recommends GCD queues if they perform slower? agg
Anyhow .. seems that pthread_recursive_mutex performs 3 times better than @synchronized :)
Question .. where you using XCode 5 or XCode 6?
Any special project optimization settings?
A copy from the project will be welcome :) pablorocarREMOVE at gmail . com
Yes. I have to digg into assembly code to see how @synchronized works. I tested on XCode 5. Default "release" build configuration. Will send project soon
Many thanks Alex, this is a real demonstration, not the blah blah you can read in Internet and Apple Documentation.
Just out of curiosity, with this results .. are you thinking in changing @synchronized call to the pthread_recursive_mutex?
Maybe I something missed, but I didn't see exception-related function calls in assembly code.. Here is test_syncrhonized method: I see that my code arounded only by objc_sync_enter and objc_sync_exit. There is objc_sync_enter function and objc_sync_exit
I found some sources from apple with these functions from above: http://www.opensource.apple.com/source/objc4/objc4-493.9/runtime/objc-sync.m
// Begin synchronizing on 'obj'.
// Allocates recursive mutex associated with 'obj' if needed.
// Returns OBJC_SYNC_SUCCESS once lock is acquired.
int objc_sync_enter(id obj)
{
int result = OBJC_SYNC_SUCCESS;
if (obj) {
SyncData* data = id2data(obj, ACQUIRE);
require_action_string(data != NULL, done, result = OBJC_SYNC_NOT_INITIALIZED, "id2data failed");
result = recursive_mutex_lock(&data->mutex);
require_noerr_string(result, done, "mutex_lock failed");
} else {
// @synchronized(nil) does nothing
if (DebugNilSync) {
_objc_inform("NIL SYNC DEBUG: @synchronized(nil); set a breakpoint on objc_sync_nil to debug");
}
objc_sync_nil();
}
done:
return result;
}
// End synchronizing on 'obj'.
// Returns OBJC_SYNC_SUCCESS or OBJC_SYNC_NOT_OWNING_THREAD_ERROR
int objc_sync_exit(id obj)
{
int result = OBJC_SYNC_SUCCESS;
if (obj) {
SyncData* data = id2data(obj, RELEASE);
require_action_string(data != NULL, done, result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR, "id2data failed");
result = recursive_mutex_unlock(&data->mutex);
require_noerr_string(result, done, "mutex_unlock failed");
} else {
// @synchronized(nil) does nothing
}
done:
if ( result == RECURSIVE_MUTEX_NOT_LOCKED )
result = OBJC_SYNC_NOT_OWNING_THREAD_ERROR;
return result;
}
I didnt see anything related to exception handling
Yes, they are: objc_sync_enter and objc_sync_exit, but not many doc out there. You can use it as:
objc_sync_enter(self); (code) objc_sync_exit(self);
Just out of curiosity, with this results .. are you thinking in changing @synchronized call to the pthread_recursive_mutex?
I think that @synchronized looks more elegant - no chance to forget 'unlock', because whole statement inside braces. I think It's more important than micro optimization.
Anyway I know more serious things to optimize in Typhoon (some object creations can be replaced by 'pool of object' pattern to reduce heap memory allocations), so it's more important to start here.
Cool, many thanks Alex
Thanks for discussion Pablo. I was surprised too
Just for confirming .. XCode 6 Beta 6 using iOS8 simulator, produces the same results.
Finally after running more tests and thinking. I'll move my things to NSRecursiveLock, in this way:
- (void)test_recursive_lock:(int)b
{
static NSRecursiveLock *recursiveLock;
if (recursiveLock == nil)
{
recursiveLock = [[NSRecursiveLock alloc] init];
}
[recursiveLock lock];
a = b;
[recursiveLock unlock];
}
Even is more verbose than @synchronized, it performs faster and the method has everything about the lock inside it (does not required a previous init) so this offers better testing
But is uses static NSRecursiveLock *recursiveLock;
so, that will be shared between objects. Why?
and how do you handle this case
- (void)test_recursive_lock:(int)b
{
static NSRecursiveLock *recursiveLock;
if (recursiveLock == nil)
{
recursiveLock = [[NSRecursiveLock alloc] init];
}
[recursiveLock lock];
a = b;
[recursiveLock unlock];
}
- (void)some_another_place
{
//we have to lock access here
a = c;
}
The static is just shared only by the test_recursive_lock method, the scope for the static is just for the method.
If you do:
- (void)test_recursive_lock:(int)b
{
static NSRecursiveLock *recursiveLock;
if (recursiveLock == nil)
{
recursiveLock = [[NSRecursiveLock alloc] init];
}
[recursiveLock lock];
a = b;
[recursiveLock unlock];
}
- (void)test_recursive_lockAtoC:(int)c
{
static NSRecursiveLock *recursiveLock;
if (recursiveLock == nil)
{
recursiveLock = [[NSRecursiveLock alloc] init];
}
[recursiveLock lock];
a = c;
[recursiveLock unlock];
}
The recursiveLock object are different objects, so they are not shared
Problems: 1) If test_recursive_lock and test_recursive_lockAtoC called at same time (from two different threads), then a = b and a = c will be set at same time. I.e. locking will not works 2) if you create two objects (ViewController in our example), recursiveLock will be same inside test_recursive_lock method i.e. shared between objects
I think this discussion out of scope and not related to Typhoon
Yep .. sorry
Great work folks. @pabloroca, thanks for your input.
Wouldn't be faster in performance to replace this slow @synchronized sentences by a pthread_mutex_lock / pthread_mutex_unlock ?