MariaDB C++ connector to modify the DB directly from siteupdate

jteresco commented 6 months ago

Issue #620 proposes to speed up what is now the vast majority of the site update time (loading the DBs from the siteupdate-generated .sql files) by looking at ways to speed up the reading of those files. Another option we might pursue is to use the MariaDB C++ connector to make the database updates right from the C++ siteupdate program.

This seems to have a lot of potential to speed things up:

We might not need to write the large .sql files at all.
It seems possible that this would be a faster process than what the mysql command line does.
The DB could be updated as information is ready to be put into the DB at various phases of the siteupdate process, not just all at the end.
We could take advantage of threads to utilize more cores in this phase of the process.
We might not need to write two databases on each site update. Instead, we might create a brand new database each time and keep the last few around to play the role of TravelMappingCopy, which is now used to keep the site functional while TravelMapping is updated by importing the .sql. (This might make sense even if we keep using the .sql files.)

Potential downsides:

It's kind of nice to have the .sql files around so we can just recreate our DB at any time by reloading one of those.
A failed site update would potentially have a partially-created DB that would need to be properly removed.
We would have to code this up.

yakra commented 5 months ago

It's kind of nice to have the .sql files around so we can just recreate our DB at any time by reloading one of those.

Another advantage for me: During development, I'll run the existing siteupdate followed by the in-dev version, and then compare_all the results, which runs compare_sql (even if it has known issues).

If it reports diffs, I (probably) messed up somewhere. True, most diffs are captured earlier in the process when diffing stats CSVs, userlogs, or graphs, but it's nice to have that extra level of security.

No need to view this as a downside though. We can just keep the existing code around as an option, disabled by default.

We might not need to write two databases on each site update. Instead, we might create a brand new database each time and keep the last few around to play the role of TravelMappingCopy, which is now used to keep the site functional while TravelMapping is updated by importing the .sql. (This might make sense even if we keep using the .sql files.)

That last sentence said what I was thinking. In either case, nothing immediately comes to my mind on how to implement. Do you have any ideas?

jteresco commented 5 months ago

We'd be doing some experimenting for sure if we want to try this. Looks like we'd need to compile and link with the appropriate library and modify the DB(s) from C++ as the site update progresses.

I think one remaining sanity check that could happen on diffs is to populate a test db then dump it with sqldump. Should be diff'able.

TravelMapping / DataProcessing

MariaDB C++ connector to modify the DB directly from siteupdate #630