New Database Layer - Githubissues

matty0ung commented 3 years ago

This is the new database layer, which separates out:

mapping objects to/from database storage
dealing with the specifics of an individual database (credentials, backups, different ways of doing things in SQLite/PostgreSQL, etc)

It also implements export to BeerXML using the same new data structures etc we created to improve reading from BeerXML.

A few other things have come along for the ride because they were minor tidy-ups I'd implemented in Brewken and it would have been more effort to exclude them than to include them.

All database stuff is now in the database subfolder of src. Key files:

Database.cpp, Database.h handle abstraction of SQLite / PostgreSQL (as before) but no longer do any object mapping
DatabaseSchemaHelper.cpp, DatabaseSchemaHelper.h handle schema upgrades and migrations, plus default data population (as before). I've disentangled this from the old Schema files that are no longer used. In particular schema changes are now hard-coded. (If we've got the SQL that, say, upgrades from v7 to v8 of the DB, we don't want this to be tied to "current" mapping as you still want to be able to do that exact same v7 to v8 step (and v8 to v9 etc etc) at any later date, including when the "current" schema has evolved and changed from whatever v7 was.) I have also moved "default data" to BeerXML and stopped using the bt tables. (Now that we decided not to try to edit the user's existing data, we only needed the bt tables for duplicate detection - ie not to import the default Recipe/Style/Hop/etc objects that are already in the user's database. But we have such logic already in the BeerXML import code. So we can just reuse that and it will automatically pull only "new" objects out of data/DefaultData.xml. Of course, once we support BeerJSON, we'll switch to that.)
ObjectStore.cpp, ObjectStore.h Does all the SQL to create/read/update/delete objects in the database. Normally you won't access this directly, but instead via templated functions in the ObjectStoreWrapper namespace
ObjectStoreTyped.h, ObjectStoreWrapper.h add templated wrappers around ObjectStore to save callers doing lots of tedious casting
ObjectStoreTyped.cpp contains all the data required to map objects to and from database tables
DbTransaction.cpp, DbTransaction.h just provide an RAII wrapper around DB transactions

A fair bit of class-specific logic that used to be inside the Database class is now inside the individual subclasses of NamedEntity (in the src/model directory). Eg, more of the Recipe versioning logic now lives inside the Recipe class. I've tidied up a few things with helper functions so that we have less copy-and-paste code on setter member functions.

Generally, you'll see that:

Dealing with storing and retrieving objects is now much more C++ than SQL. Eg, if you want "all Recipes that match such-and-such condition" then that condition is going to be a lambda that you pass to ObjectStoreWrapper::findAllMatching<Recipe>() rather than a WHERE clause that you tack onto a SQL statement in a new Database member function. Similarly, default data for a newly-created object is always in that object's constructor and not in the database definition.
The mappings from objects to database tables are more concise and use fewer global constants.
We have better encapsulation as fewer classes need to be friends with each other, and classes in the src/model directory do not need to have any knowledge of SQL or XML.
Inside the object model, we do more lazy-loading (eg a Recipe object holds the IDs of the Hops it uses and then turns those IDs into Hop objects if and when needed), but this should be largely invisible to the UI code.
It should now be less work in future to add support for another SQL database (eg MariaDB) or even a NoSQL (eg CouchDB or Cassandra).

Additionally, I have:

Tidied up logging a bit, including now logging a thread ID (in case we do any multi-threaded stuff in future) and ensuring that errors about logging itself are always logged to the console even if we have disabled other console logging
Enhanced the GitHub actions for the Mac build so that you can see a stack trace if the automated tests crash. (This is what happens when you don't own a Mac!)
Separated out persistent settings into a namespace called PersistentSettings
Created a new class BtStringConst for wrapping compile-time-constant strings. (See class comment in utils/BtStringConst.h for part of the motivation for this but, amongst other things, this made the Windows and Mac builds happy.)
Combined the above two steps to use proper constants for all access of persistent settings (which along the way uncovered a few minor bugs where, eg, we were using one key to store a value and another one to attempt to read it)

This all turned out to be a bit more work than I imagined back in January, but I'm happy with the result. There's scope to refine things a bit further (eg I think some or all of ObjectStoreTyped could be merged into ObjectStoreWrapper) but I don't see the need to do another mega-change like this one.

As ever, questions and comments most welcome.

mikfire commented 3 years ago

First, a disclaimer. This is a part of the code I've never really used, understood or particularly liked. I have long felt mashing in brewtarget is the hardest part to understand, and the part most in need of a significant redesign. Which I will do, just as soon as I get the brilliant insight as to what it should look like.

I believe the idea was that if you had a mash profile you always used -- like all my beers are usually single infusion, mashed at 66C for 60 minutes and initial temp around 20C -- you could create a named mash and just reuse that instead of having to build a new profile every time.

There is no separate window in which to create the mash profile. The save button was how we got around that problem. You could build the mash profile in the main window and then save it. You could also load that named mash, make changes as required and then save it and the next use of that profile would have the changes.

mikfire commented 3 years ago

Having a look at this. Looks like, with the Mash Designer, we're creating a Mash OK but then inserting a MashStep in the DB without having set its Mash ID.

Similarly, I see what you mean about MashStepEditor::saveAndClose too. It calls Brewtarget::mainWindow()->addMashStepToMash(), which calls Mash::addMashStep() but that doesn't save the MashStep in the DB. Will have a think about where best to do that. As you say, we don't want to try to insert things twice as it causes other problems. (We could bodge round them by calling insertOrUpdate() rather than insert(), but it would be nicer to just call insert once at the right time.)

I will point out almost every other editor window manually sets the cacheOnly flag after the insert call. EquipmentEditor doesn't, but I suspect that is not intentional.

What if we included a parameter to insert that would set the object's cacheOnly. I am thinking something like a boolean that defaults to false. I would expect the standard use case is that we want cacheOnly() set to false after the insert, so using that as the default would mean the least typing.

mikfire commented 3 years ago

The issue in the MashWizard is that we are not calling either Brewtarget::mainWindow()->addMashStepToMash() or mashStep->setMash() when required. Assuming we want undo, we need the first call. If we don't care, then the second call. It's still a little broken with the sorting after adding new steps, but at least they show up and get sorted properly on restart.

The mash designer is supposed to clear the entire mash profile before it starts, which it doesn't seem to be doing. I will point out that the "Total Wort Collected" isn't working properly in HEAD on develop. So we will fix that as a separate problem.

matty0ung commented 3 years ago

I must admit, I was a bit puzzled about the cacheOnly flag. I see the effect but I wasn't sure about the purpose. Originally I figured it was to mark an object as "this may not end up getting saved to the database" (eg we're creating a new thing and the user could either click save or cancel). But then I thought we could infer that from the object's key being (or not being) valid, so I guessed there must be another use case for it, but I never quite followed through all the code paths to work it out. If there's some simple rule about when it should and shouldn't be set, perhaps we could manage that somewhere centrally as you suggest.

mikfire commented 3 years ago

I am dead against "inferring" anything. Inference like that will always break, as people forget or new people come on board. If we need a thing, we declare it and use it for that purpose. In general, I would rather things be easy to understand than to save even a few thousand lines of code. Code is cheap, my time is not.

Allow me to state that again. I hate inferring status. If you need to know if a thing is this or that, use a specific and well-named flag to indicate if the thing is this or that. In this day and age, I do not really care if my in-memory foot print is now 1 byte bigger because I used a boolean where I could have inferred. Yes, there are problem domains (embedded systems, for example) where that really does matter. I am not coding brewtarget for those problem domains, and I am going to optimize my ability to read and understand the code over memory.

By the way. Have I mentioned how much I hate inferring status? It is almost always clever, and there are few things more dangerous in IT than people being "clever". It usually means you have made it difficult to impossible to ever change your approach, because there are all these side effects that simply are forgotten, unknown or otherwise not fixable. Writing code it intentionally rely on a side effect is just wrong.

The intent of that flag was mostly for speed. It provided a really fast way to load an object that did not require making one query to the database for each attribute for every object. As I recall, it required something like 10,000 - 20,000 queries just to load my database. It also had any number of weird side effects, because we would do things like recipe->recalcall, which would then rewrite attributes that we were going to read next. I could turn all that off by setting the cacheOnly flag. It literally dropped the number of queries by an order of magnitude.

It also became useful for the editors, so that we could create something in memory and not write it until the user clicks "save". Instead of writing each attribute as a separate statement, it allowed me to do one massive write that inserted everything. The fewer trips we make to the database, the faster performance.

This had the unexpected but really pleasing side effect of not actually writing a thing to the database until the user said "Ok". It is difficult to describe how annoying it was to start adding a new hop, hit "cancel" and still have to delete the entry from the tree. This was honestly not planned, but was such a relief to me that it made the entire effort worth it.

The final reason is that there had been a previous attempt at some kind of caching mechanism. It was only ever applied to one of the base tables (fermentables, as I recall). When I was trying to make the networked database more performant, I had to deal with it. In studying the problem, I felt the first attempt was optimizing the wrong way. For the most part, we want one field flushed to the database immediately. Only in rare circumstances (like the editors) do we want to cache things and then do one massive flush at the end. This was my solution.

I believe I changed how I was doing things half way through and stopped using the set*() methods on load. The idea still worked for those weird, calculated fields and it was so useful for the editors, so I kept it. I still think it is useful and am really set against changing it. No. Really. Do not break this part of the code unless you have a really solid replacement that does not infer status, because I really do not like inferring status.

matty0ung commented 3 years ago

I think I get your drift. :smile:

I'm pretty sure we violently agree that we should not add complexity to code in order to make some miniscule saving of memory footprint or run speed. And I too don't like it when you have to jump through a lot of hoops to work out what state an object is in.

In this instance however, it seemed to me that, whatever the original history of the flag, there's now a rather direct relationship between cacheOnly and "is stored in the database". Keeping two separate ways of tracking the same thing risks creating complexity rather than simplicity. Specifically, if we look at all possible of combinations of these two flags/states, we see the following:

New Object state: (key < 0 && cacheOnly == true). This is a new object that isn't yet stored in the database (key < 0) and for which we don't want setters to try to update the database (cacheOnly == true) as that wouldn't make sense.
Stored Object state: (key > 0 && cacheOnly == false). This is the state for most objects most of the time. The object is stored in the database (key > 0) and we want calls to its setters to also update the corresponding data in the database (cacheOnly == false). This means the in-memory object and its database record stay in sync.
Bad state: (key < 0 && cacheOnly == false). This state wouldn't make any sense. Having cacheOnly == false means we want setters to update the database, but they can't because key < 0 means there isn't a database record to update. If a setter were ever called on an object in this state I'd be expecting to log an error and fire an assert.
Dangerous state: (key > 0 && cacheOnly == true). This is where the object has a database record (key > 0) but we are asking the in-memory copy not to update the database when its setters are called (cacheOnly == true). This might be OK if the user of the object is calling a lot of setters in succession and wants to optimise performance. As long as they always call ObjectStoreWrapper::update() at the end (and set cacheOnly to true again), then things will work. BUT, if that ever doesn't happen then the in-memory object will be out of sync with its database record and the user risks data loss. It seems like this state creates complexity that is only of benefit if we want some optimisation that is probably of only small benefit...

If we were to eliminate the cacheOnly flag, we'd reduce these four states down to two: New Object and Stored Object, which seems like a simplification to me. In particular your points above are still addressed:

we could create something in memory and not write it until the user clicks "save"

This would still be the case. Calling setters on a newly-created object doesn't try to talk to the database. Only once the user clicks "save" do we insert the object in the DB, give it a key, and thereby activate the behaviour where setters update the DB record.

The intent of that flag was mostly for speed. It provided a really fast way to load an object that did not require making one query to the database for each attribute for every object.

The ObjectStore code already does what we want here, without needing to reference cacheOnly. When loading objects from the database, it does one query per table, reads an entire row at a time from from result set and uses that data to construct an object without calling its setters. (This is the purpose of model/NamedParameterBundle.) I don't know what performance is with PostgreSQL, but it certainly seemed fast enough with SQLite on my machine.

I'm not saying we need to eliminate the cacheOnly flag as part of this commit, but, if we agree it's past its sell-by date then it would make fixing some of the other bits of the code simpler.

matty0ung commented 3 years ago

Some fixes coming too, but will probably be tomorrow as need to do some casting or similar to get it to compile.

matty0ung commented 3 years ago

OK, Mash and MashStep should be behaving a lot better now. (I had missed finishing off the reworking of how the reference each other.) Seems to work better on my local testing, but I'm not a guru of the Mash Wizard Designer things, so please shout if you see any oddities.

mikfire commented 3 years ago

Sorry for the hiatus. There are issues with the mash steps and all the editors still.

The mash designer still does nothing. Saying "next" should add the step to the mashes, and it doesn't. It doesn't seem to add anything, but I haven't dug hard enough into the database to determine just what is or isn't happening.

Trying to run the mash wizard on a recipe that already has mashsteps dumps core like this:

#5  0x00005555557d9683 in NamedEntity::name (this=0x555556c0b8c0) at /home/mik/brewtarget/matty0ung/src/model/NamedEntity.cpp:225
#6  0x0000555555784f10 in MashStepTableModel::data (this=0x555556a52c30, index=..., role=0) at /home/mik/brewtarget/matty0ung/src/MashStepTableModel.cpp:244
#7  0x00007ffff73ae508 in ?? () from /usr/lib64/libQt5Widgets.so.5
#8  0x00007ffff73af465 in QItemDelegate::sizeHint(QStyleOptionViewItem const&, QModelIndex const&) const () from /usr/lib64/libQt5Widgets.so.5
#9  0x00007ffff73e2215 in ?? () from /usr/lib64/libQt5Widgets.so.5
#10 0x00007ffff73e242d in QTableView::sizeHintForColumn(int) const () from /usr/lib64/libQt5Widgets.so.5
#11 0x00007ffff739e25a in ?? () from /usr/lib64/libQt5Widgets.so.5
#12 0x0000555555784c76 in MashStepTableModel::mashStepChanged (this=0x555556a52c30, prop=..., val=...) at /home/mik/brewtarget/matty0ung/src/MashStepTableModel.cpp:203
#13 0x00005555555c76c2 in MashStepTableModel::qt_static_metacall (_o=0x555556a52c30, _c=QMetaObject::InvokeMetaMethod, _id=5, _a=0x7fffffffcbc0)
    at /home/mik/brewtarget/matty0ung/build/src/btobjlib_autogen/EWIEGA46WW/moc_MashStepTableModel.cpp:106
#14 0x00007ffff66002de in ?? () from /usr/lib64/libQt5Core.so.5
#15 0x00005555555d31c5 in NamedEntity::changed (this=0x555556cb53d0, _t1=..., _t2=...) at /home/mik/brewtarget/matty0ung/build/src/btobjlib_autogen/NLOY5YBGEN/moc_NamedEntity.cpp:249
#16 0x00005555557d9f39 in NamedEntity::propagatePropertyChange (this=0x555556cb53d0, propertyName=..., notify=true) at /home/mik/brewtarget/matty0ung/src/model/NamedEntity.cpp:353
#17 0x00005555557d41c2 in NamedEntity::setAndNotify<int> (this=0x555556cb53d0, propertyName=..., memberVariable=@0x555556cb5440: 1, newValue=1)
    at /home/mik/brewtarget/matty0ung/src/model/NamedEntity.h:373
#18 0x00005555557d32da in MashStep::setStepNumber (this=0x555556cb53d0, stepNumber=1) at /home/mik/brewtarget/matty0ung/src/model/MashStep.cpp:140
#19 0x00005555557cfba9 in Mash::impl::setCanonicalMashStepNumbers (this=0x555556906020) at /home/mik/brewtarget/matty0ung/src/model/Mash.cpp:57
#20 0x00005555557cb43a in Mash::removeMashStep (this=0x555556905ed0, mashStep=0x555556c0b8c0) at /home/mik/brewtarget/matty0ung/src/model/Mash.cpp:436
#21 0x000055555578d9b9 in MashWizard::wizardry (this=0x555556429460) at /home/mik/brewtarget/matty0ung/src/MashWizard.cpp:200
#22 0x000055555575d3d9 in QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (MashWizard::*)()>::call(void (MashWizard::*)(), MashWizard*, void**) (
    f=(void (MashWizard::*)(MashWizard * const)) 0x55555578d66c <MashWizard::wizardry()>, o=0x555556429460, arg=0x7fffffffd0d0) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:152

I've edited that a little, because the first 4 entries really weren't interesting.

I am testing with a simple infusion and two batch sparges. If there are no sparge steps, two steps get created and the math looks roughly correct. Rerunning the mash wizard at that point dumps core. If I restart brewtarget there is one batch sparge step and the infusion step. Running the mash wizard dumps core again, in the same spot. Running brewtarget again shows just the infusion and no sparges. Since there are no sparge steps, two steps get created. And so on.

Attempting to remove a mash step by clicking the "-" button causes the same core dump, so I am thinking removing a mash step is causing something grief. It doesn't seem to happen with any of the other tables. Based on the core dump, I suspect you may be trying to get a name that doesn't exist any longer?

It would also be nice to fix the sort. The infusion step should be before the sparge steps, but I don't know if that is a related issue.

matty0ung commented 3 years ago

Thanks. This is all good stuff. The fixes are all simple but one needs to touch a fair bit of code, so I'll work on it over the next couple of days.

The one that involves lots of small changes is to do with object ownership. Essentially when we really really delete something - eg a MashStep that has no meaningful existence once it has been removed from a Mash - we should really be using shared pointers properly to make sure we can the remove the object from the DB and ObjectStore but still have it in the undo/redo stack in case the user wants to undo the deletion. The shared pointers are all already there in the ObjectStore, I just need to get a few more bits of the code to use them. (Mostly where we come back out of the database layer into the existing code, we just reach inside the shared pointer to use the raw pointer, because everything was originally done with raw pointers (and which is still correct when you're creating a Qt object that the framework owns by some chain of parenthood etc). Anyway, this use of raw pointers works fine when we're not deleting things or are only soft deleting things or can recreate the things that are deleted, which is ~99% of the code. But I need to make more things pass the shared pointer around to ensure the right things happen in the other ~1% of cases. A side benefit of this should be that we clear up a few potential memory leaks in the existing code.)

matty0ung commented 3 years ago

OK, things should be somewhat more robust now. I'm still uncertain about some of the finer points of the Mash Wizards and Designers (eg exactly when and how it should complain about your mash being too thick for it to be able to proceed) but I don't think this is related to the database layer!

matty0ung commented 3 years ago

Conflicts above are from the Print Preview work by @mattiasmaahl that I just merged into trunk. Will resolve merge conflicts with this branch once @mikfire happy with everything else.

mikfire commented 3 years ago

The mash designer still isn't behaving properly. I think this should be an easy fix and I am willing to sign off on this merge as it exists, as long as we put a problem ticket in so we don't forget to fix this.

On starting the mash designer, it is supposed to delete all the existing mashsteps. Currently, it doesn't. The behavior is very strange after that, as it seems the designer is modifying the existing steps instead of creating new ones?

If I manually remove the existing steps, it behaves more as I would expect. The total collected wort is wrong. That will be a signal, I am quite certain. I've broken this screen enough times I could almost tell you what signal it is.

matty0ung commented 3 years ago

Thanks Mik, as ever, much appreciated.

I have raised https://github.com/Brewtarget/brewtarget/issues/620 for the mash designer problems and have merged in the new print preview stuff from @mattiasmaahl (with a small extra tweak to make it behave a bit nicer on HighDPI displays).

Brewtarget / brewtarget

New Database Layer #617