Non-ARM x86-64 desktop CPUs compatibility

nxrighthere commented 5 years ago

Thanks for the neat code, I was looking for portable synchronization mechanisms for my concurrent networking library, and it seems that I finally found it, but I have only one question if you don't mind. 😸

Is it safe to use the rwsync implementation on non-ARM x86-64 desktop CPUs with multiple threads (2-3) and wr/rd functions with a single synchronizer simultaneously? My basic tests show good results on AMD FX-4300, but I didn't test it on Intel yet. Any caveats that I should be aware of?

WonderfulVoid commented 5 years ago

Yes rwsync should be safe to use on all x86-64 processors. I do most of my testing on ARM (AArch64) processors but since I depend mostly on the compiler, there should be less risk of target specific problems.

Please speak up if you are missing some scalable datatype. I love to invent new algorithms.

nxrighthere commented 5 years ago

Thanks, that's great! Currently, I'm using the rwsync to access shared structures for read/write operations, but I'm also planning to integrate SPSC ring buffer to dispatch transport messages from network thread to main thread since this approach works quite well in managed environment for me, and it's very convenient.

I think this library provides everything that I currently need thanks for your work. Cheers. 😺

nxrighthere commented 5 years ago

@WonderfulVoid Maybe you can help me to solve some design problem that I encountered recently...

I'm writing a networking library where a user should register a callback function which is triggered from a separate thread when network events occur. Before the callback is triggered, a synchronizer acquired for logic around shared data. In the library's public API, I provide functionality that also acquires the same synchronizer for read/write operations for the same shared data. Everything is okay while this functionality used outside of the callback, but if a user is going to use public functions within the callback for a logic chain, then a deadlock will occur because the synchronizer already acquired.

I'm trying to find a solution to solve this transparently for a user. A rough way is to add a boolean parameter to public API which should indicate when a function is called within the callback and the synchronizer shouldn't be acquired.

Any ideas?

nxrighthere commented 5 years ago

I think a sort of recursive SRW which tracks threads that acquired a synchronizer would be great. 🤔

nxrighthere commented 5 years ago

Found the implementation on top of critical sections which already support recursive access but they still added it for some reason.

It's possible to add a similar owner check to the rwlock/rwsync implementation? I'm afraid that there's no portable way to do this... 🤔

WonderfulVoid commented 5 years ago

Let me check out these things. How are SRW locks different from vanilla reader/writer locks?

nxrighthere commented 5 years ago

It's the same things essentially that doesn't support recursive access unlike critical sections for example.

nxrighthere commented 5 years ago

I think I've found the way to solve this: after acquiring a synchronizer in the callback thread and triggering the callback itself, I'll set a variable internally to check in public API if it was acquired there or not. 👌

But yea, I think it would be nice to have a portable recursive synchronization mechanism.

WonderfulVoid commented 5 years ago

Have a look here: https://github.com/ARM-software/progress64/commit/d6f492105624253cbc1e68bd5b2bca28e997c813

Currently a thread cannot acquire a synchroniser for read when the same synchroniser has already been acquired for write. Reason #1 is for implementation simplicity (acquire_rd() currently waits until a write in progress has completed but this behaviour can be changed). Reason #2 is that if a write is in progress (by this thread), how do we know the state of the protected data? Perhaps it is only half updated. We don't want to allow "torn" reads. And we cannot wait until the write has completed because we would be waiting for ourselves (deadlock). But a thread can call rwsync_r_acquire_rd() recursively and rwsync_r_acquire_wr() recursively.

If you like this, I can do the same for the rwlock.

nxrighthere commented 5 years ago

Awesome, thanks for this!

Yes, if you can do the same for the rwlock it would be great. ⭐️

WonderfulVoid commented 5 years ago

https://github.com/ARM-software/progress64/commit/7650d8112de79d90fe95cd8ea0e16fae2636f04c

nxrighthere commented 5 years ago

Fantastic, thanks! 🍻

WonderfulVoid commented 5 years ago

I am considering moving the owner field to a per-thread variable (just like the count variable). Then p64_rwlock_r_t would be identical to p64_rwlock_t (is this good or bad). The implementations could then also be merged and recursive mode always enabled. This would however require that the thread ID is always specified in the acquire_rd/wr calls... PROGRESS64 has no internal concept of thread ID.

WonderfulVoid commented 5 years ago

The implementation has not been stress tested, just some basic smoke testing using the rwlock_r example (including some negative cases that lead to program abort, I removed those test cases afterwards).

nxrighthere commented 5 years ago

Yea, no worries, I'll test it very soon. 👌

nxrighthere commented 5 years ago

@WonderfulVoid Thanks, everything works great!

nxrighthere commented 5 years ago

Here's a portable (I believe) way to get thread ID:

#ifndef UTILS_GETTID_H
#define UTILS_GETTID_H

#if defined(_WIN32)
    #ifndef __MINGW32__
        typedef DWORD pid_t;
    #endif
#else
    #include <unistd.h>
    #include <sys/syscall.h>
#endif

#ifdef  __cplusplus
extern "C" {
#endif

#if defined _WIN32
    inline static pid_t gettid(void) {
        return GetCurrentThreadId();
    }
#elif defined __APPLE__
    inline static pid_t gettid(void) {
        return syscall(SYS_thread_selfid);
    }
#else
    inline static pid_t gettid(void) {
        return syscall(__NR_gettid);
    }
#endif

#ifdef  __cplusplus
}
#endif

#endif //UTILS_GETTID_H

WonderfulVoid commented 5 years ago

Thanks, I will see if I can add this to a new porting layer.

WonderfulVoid commented 5 years ago

OK I have something working. The API will change, the tid parameter will be removed from the acquire calls.

WonderfulVoid commented 5 years ago

https://github.com/ARM-software/progress64/commit/4f09a39984be8a36bfc245d8935743c4f1cdccf4

nxrighthere commented 5 years ago

It works great, thanks! I only wish that we could call multiple reads simultaneously after the write progress completed, that would just perfect. ☁️

nxrighthere commented 5 years ago

Found this article, so he's using an additional critical section on top to achieve this.

WonderfulVoid commented 5 years ago

I don't understand your comment above.

I think the current design has a problem when a thread tries to acquire different locks. The per-thread rwl_count needs to be per-lock as well. The recursive rwsync_r does not have this problem, the recursive count is only used for write access so is associated with the synchroniser, not with the thread.

WonderfulVoid commented 5 years ago

I have an idea on how to solve the problem though. We will need a per-thread stack of acquired rwlocks.

nxrighthere commented 5 years ago

Looking forward for this. I'm happy to test any implementation. 👍

WonderfulVoid commented 5 years ago

https://github.com/ARM-software/progress64/commit/7dd6944d1388ae1d71a1bdd81dc6d0fbed995fc1

WonderfulVoid commented 5 years ago

It works great, thanks! I only wish that we could call multiple reads simultaneously after the write progress completed, that would just perfect.

Still don't understand this comment. Is it still relevant?

nxrighthere commented 5 years ago

Yes, due to:

//Not allowed to call acquire-read when the lock has already been acquired
//for write, protected data is in unknown state and the call cannot block
//waiting for the write to complete
//Recursive acquire-read calls allowed for the same rwlock
void p64_rwlock_r_acquire_rd(p64_rwlock_r_t *lock);

I'm getting rwlock_r: acquire-read after acquire-write when read trying to acquire a synchronizer if the write is in progress.

WonderfulVoid commented 5 years ago

https://github.com/ARM-software/progress64/commit/f409a7493ab2073f2da0b66daf3295844637bf18

WonderfulVoid commented 5 years ago

Now you can acquire_rd after acquire_wr on the same lock. It is up to the application to handle the possibility of reading the protected data in the middle of its own update. You have been warned!

nxrighthere commented 5 years ago

Thanks, I'll do a couple of tests. 🔬

nxrighthere commented 5 years ago

It works as intended, great job. 👍

WonderfulVoid commented 5 years ago

Thanks. It was fun.

ARM-software / progress64

Non-ARM x86-64 desktop CPUs compatibility #2