cloudquery / plugin-sdk

CloudQuery Go SDK for source and destination plugins
Mozilla Public License 2.0
21 stars 23 forks source link

feat: Sort state client cache keys before flush #1856

Open askurydzin opened 1 month ago

askurydzin commented 1 month ago

Summary

in some concurrent scenarios several cache keys can be updated by several routines, thus, in order to avoid the deadlocks (in postgresql for example) we should update keys in sorted order.

erezrokah commented 1 month ago

Hi @askurydzin 👋 Thanks for the PR. Do you mind opening an issue reproducing the scenario you're experiencing? That would help us understand the fix better. You can open the issue in our main repository.

Alternatively can you add a unit test that's failing without this fix (e.g. one that uses the state client and has a deadlock in Postgres)?

askurydzin commented 1 month ago

Hello, @erezrokah! It would be quite hard to reproduce it as it requires writing some demo plugin from scratch which uses state client. Would it be fine if I just briefly explain why primary keys for db needed to be sorted during insert/update?

When it goes about two concurrent transaction T1 with set of keys { K1, K2, K3 } and T2 with set of keys T2 { K2, K3, K4 }, and we don't sort keys, following issue happens in postgres destination:

T1 BEGIN T2 BEGIN
INSERT K1 (exclusive-lock for K1) INSERT K2 (exclusive-lock for K2)
INSERT K2 (wait share-lock for K1 / locked by K2) INSERT K1 (wait share-lock for K1 / locked by K1)

In the end both transactions appear to be in a dead-lock state.

So, sorting state table primary keys before inserting allows to avoid such situations.

erezrokah commented 1 month ago

It would be quite hard to reproduce it as it requires writing some demo plugin from scratch which uses state client. Would it be fine if I just briefly explain why primary keys for db needed to be sorted during insert/update?

Is the issue not reproducible on any of our existing plugins? I think it's better to have a reproduction otherwise we can't confirm the fix is working.

askurydzin commented 1 month ago

@erezrokah no, I think it's not reproducible in any of existing plugins as I'm not sure if any of existing plugins (that I may have an access to) use state client. I stumbled across the issue when writing my own plugin (it's private). The issue is quite straight-forward, but if existence of the plugin vulnerable to the issue is the case for this SDK, I cannot write some demo plugin because it wouldn't be an existing plugin per se, and it may seem too artificial example.

Hence, closing the PR. Sorry for wasting your time.

erezrokah commented 1 month ago

Hence, closing the PR. Sorry for wasting your time.

Not wasting anyone's time that's for sure. We appreciate the feedback.

There's a public plugin with a state client here https://github.com/cloudquery/cloudquery/blob/e94dda546631c5ef801852bbae997ad11f615f2e/plugins/source/hackernews/resources/services/items/items_fetch.go#L34

To clarify the request is not for one of our own plugins to reproduce this, but to have a test that fails without the fix, and passes with the fix, or a minimal reproduction scenario that shows the bug (maybe a very trimmed down version of your own plugin without any private code).

askurydzin commented 1 month ago

@erezrokah Hm, if the run of the mentioned plugin consistently emits the set of keys that may overlap, I think running it in parallel may reproduce the thing, then give me some time and I'll check if it work.