TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

MariaDB C++ connector to modify the DB directly from siteupdate #630

Open jteresco opened 6 months ago

jteresco commented 6 months ago

Issue #620 proposes to speed up what is now the vast majority of the site update time (loading the DBs from the siteupdate-generated .sql files) by looking at ways to speed up the reading of those files. Another option we might pursue is to use the MariaDB C++ connector to make the database updates right from the C++ siteupdate program.

This seems to have a lot of potential to speed things up:

Potential downsides:

yakra commented 5 months ago
  • It's kind of nice to have the .sql files around so we can just recreate our DB at any time by reloading one of those.

Another advantage for me: During development, I'll run the existing siteupdate followed by the in-dev version, and then compare_all the results, which runs compare_sql (even if it has known issues).

If it reports diffs, I (probably) messed up somewhere. True, most diffs are captured earlier in the process when diffing stats CSVs, userlogs, or graphs, but it's nice to have that extra level of security.

No need to view this as a downside though. We can just keep the existing code around as an option, disabled by default.

We might not need to write two databases on each site update. Instead, we might create a brand new database each time and keep the last few around to play the role of TravelMappingCopy, which is now used to keep the site functional while TravelMapping is updated by importing the .sql. (This might make sense even if we keep using the .sql files.)

That last sentence said what I was thinking. In either case, nothing immediately comes to my mind on how to implement. Do you have any ideas?

jteresco commented 5 months ago

We'd be doing some experimenting for sure if we want to try this. Looks like we'd need to compile and link with the appropriate library and modify the DB(s) from C++ as the site update progresses.

I think one remaining sanity check that could happen on diffs is to populate a test db then dump it with sqldump. Should be diff'able.