Add support for MySQL/MariaDB

jbax commented 4 years ago

Would it be possible to introduce support for other databases such as MySQL? I ask this because MySQL/MariaDB can produce binary logs from which we can read database changes as they are applied, and we can monitor these to produce "events" of interest, which can be filtered for specific conditions and then be broadcast to other applications in real time. This reduces the need for users of the cardano-db-sync to pool the database from time to time, and perform potentially slow, costly queries.

If you are happy with the idea I might even try to submit a PR with the changes required to introduce support for MySQL, as well as create the monitoring solution I just described above.

erikd commented 4 years ago

Theoretically it might be possible to introduce support for other databases, but:

I have a long list of features to add which are of much higher priority.
It would impose a significant extra configuration. maintenance and testing burden to make sure every single change works with all supported databases.

I ask this because MySQL/MariaDB can produce binary logs from which we can read database changes as they are applied, and we can monitor these to produce "events" of interest,

Could this not be done with PostgreSQL listeners? I would be willing to bet that whatever you already know how to do with MySQL can also be done with Postgres, but how it is done may be different.

erikd commented 4 years ago

@jbax Rather than ask for us to support MySQL, I think you would get much quicker results by specifying what you want to do rather than how you want to do it. By specifying what you want to do, it will be easier for me or someone else in the Cardano or PostgreSQL communities to help you.

jbax commented 4 years ago

The listener approach used by PostgreSQL is not the same as events may not be consumed (the listener may be offline). With binary logs you can have a "needle" pointing to the latest event consumed, and stop then resume the binary log reading process at will, moving the needle forward once the events were confirmed to be sent/consumed. Using that, you can also broadcast to multiple listeners which respond differently to whatever they get. More importantly, it adds no extra overhead to the database itself, as we are using a separate process reading from a binary file instead of burdening the database. That process can be even running from another server in the network instead of competing with the CPU and other resources used by the database.

jbax commented 4 years ago

What I want to do is to receive updates from the database in real time, reliably. This can't be done reliably with triggers. Also I want to be very specific about which updates to get. For example I want my server to receive an update when a transfer of 100 ADA arrives at some wallet I specify, without having to poll the database for that. I might have thousands of pending transactions waiting, some probably abandoned (think about an online e-commerce checkout, and the user never pays), and I want to confirm when payments were made as fast as possible. Polling a database is not compatible with "as fast as possible"

erikd commented 4 years ago

I still think I need the bigger picture of what you want to do.

jbax commented 4 years ago

I'm building a cardano payment gateway for shopify. It has LOTS of online stores and even more customers. Let's assume this thing gets used by 0.5% of their current user base. You'll have potentially thousands of orders pending for payment every single minute.

The shopify server will have to monitor online shop wallets. Every time ADA is sent to a payment address associated with these wallets, I want to receive a notification automatically from the database, in real time. I don't want to poll the database as this is not scalable.

I already have a solution ready to address this and send notifications in real time, but it relies on using a MySQL database instead of Postgres, simply because Postgres doesn't have anything like binary logs. I'm happy to open-source everything so everyone can receive real time updates.

Also, this project will grow to support other e-commerce platforms. So this will only get bigger over time and we'll need very good performance to handle it all. Sure, I can throw more hardware at the problem, but then it will get expensive quick.

erikd commented 4 years ago

You'll have potentially thousands of orders pending for payment every single minute.

So in MySQL, you would monitor this binary log for the incoming payments and have an individual "event" for each payment.

I already have a solution ready to address this and send notifications in real time, but it relies on using a MySQL database instead of Postgres,

Sorry, but I think you have a MySQL based solution that you are trying to fit to a Postgres problem. You have two options:

Modify db-sync to use MySQL (which is an unknown amount of work) and then maintain the modified version (which will be difficult since db-sync is still under heavy development).
Adapt your solution to Postgres.

It is up to you to decide which of the above two options is most likely to offer the best and most reliable solution in the shortest time.

Payments on the Cardano chain cannot occur outside of a block and blocks arrive on average once every 20 seconds. Rather than having a trigger on all the addresses you are interested in, you could set up a trigger on a new block, query the transactions within the block and dispatch the events you need. In fact, to make sure the block/transaction is stable, if you on a trigger of block N you probably want to look at block N - 3 or N - 4.

dcoutts commented 4 years ago

we can monitor these to produce "events" of interest, which can be filtered for specific conditions and then be broadcast to other applications in real time.

It sounds to me like you do not want a database at all. We start from an event stream (the blockchain). So use that directly. Don't convert into a database and then convert back into an event stream.

The db-sync component is to convert into a database. If you want an event stream anyway, then we should be looking at how to do live triggers on the event stream. Note that if you want it live then you have to deal with the fact that the chain is not immediately stable. It can switch forks. This is why we cannot translate into standard even sourcing tools: they assume immediate finality. You don't magically get immediate finality by converting into a DB and back into an event stream.

jbax commented 4 years ago

Thanks @erikd I'm just wondering what will happen when we have more transactions on chain. Blocks won't be arriving every 20 seconds right? I can surely code right now assuming I only need to poll the database for the latest blocks since X seconds ago as there is no point trying to do this in real time if there's a "pace" to how blocks arrive. So that will work for the time being as 20 seconds is plenty of time.

@dcoutts thank you very much for the insight about finality. So what would be a "safe" approach to ensure the a transaction is final when querying the database?

erikd commented 4 years ago

Blocks won't be arriving every 20 seconds right?

There are no current or concrete plans to have blocks arrive more often than that, but obviously that is likely to change at some time. A Postgres trigger that only happens every 20 seconds is highly likely to be highly reliable.

As for finality, the current tip block is not final due to the possibility of orphaned blocks, but the further back in time you go the higher the probability of it being final (and beyond a certain amount it is guaranteed by the protocol).

This is why I suggested N - 2 or N - 3 (with N as the tip block number). How far back you wait depends on how big your risk tolerance is. For $1 purchases you might decide that 2 is enough, whereas for $1million purchases you might want to wait for 20 blocks.

jbax commented 4 years ago

Thanks guys, I guess I can close this now.

erikd commented 4 years ago

Please feel free to ask more questions or for more advise.

CyberCyclone commented 2 years ago

@erikd I was thinking about this but supporting "any" db. It would obviously be a lot of work, but a "connection" layer would be interesting so anyone can connect up their own DB.

E.g, if someone prefers to use a NoSQL DB (MongoDB, RavenDB, Elastic Search etc.) and there's a middleware driver or extension like a plug-in that they can connect to in order to populate their own DB.

erikd commented 2 years ago

@CyberCyclone You have no idea how difficult this is, even for standard SQL database my MySQL. For another SQL database:

SQL is a very loose standard and there are guaranteed to be issue supporting more than one. For each database supported, any features that are not supported by all supported database have to be dropped or emulated.
db-sync is already incredibly hard to test. Adding another supported database effectively doubles the amount of testing needed.

Doing it for NoSQL would be orders of magnitude more difficult.

CyberCyclone commented 2 years ago

Thanks for that @erikd, I suspected that was the case and I definitely didn't expect there to be support for other DBs.

It seems what I was was thinking is more on the lines of what cardano-ogmios does. In that it extracts from cardano-node into a common format. From there a user can do their own database integration.

IntersectMBO / cardano-db-sync

Add support for MySQL/MariaDB #259