datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.82k stars 2.9k forks source link

Atomic updates in CassandraAspectDao #5997

Closed justinas-marozas closed 1 year ago

justinas-marozas commented 2 years ago

The problem

Updating an aspect in GMS translates into two operations in database:

It is important to execute both operations atomically and in isolation to avoid inconsistent state.

GMS was initially designed to work with a relational database and the solution here is easy as you can begin and commit transactions at will.

With Cassandra backend, atomic/isolated execution of these two operations is possible by using Cassandra batches, but it is not currently used, meaning that GMS with a Cassandra backend doesn't have the protection against inconsistent state. We would very much like to change that.

How things work now

AspectDao interface exposes a method runInTransactionWithRetry that allows execution of arbitrary code wrapped in a transaction. EntityService makes use of this method to ensure these insert+update operations happen in a single transaction. This behavior can't be matched in CassandraAspectDao.

How we want things to work

AspectDao interface should be changed so that its clients can't rely on transaction controls that may or may not be available depending on the backing data store. It's probably best to move this update+insert complexity to AspectDao implementations to avoid leaking relational/cassandra implementation details to EntityService.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io