LibriVox / librivox-catalog

LibriVox catalog and reader workflow application
https://librivox.org
MIT License
36 stars 17 forks source link

Display keywords on book page (as per issue #210) #212

Closed peterjdann closed 1 month ago

peterjdann commented 3 months ago

The immediate purpose of this proposed change is to allow users of the Librivox catalog to see what keywords have been applied to an audiobook, to see how many audiobooks in all have that keyword, and, where one or more other audiobooks have that same keyword, to list them by clicking the keyword concerned.

In the longer term, if/when we can improve the functioning of the Advanced Search capability, a better understanding of what keywords are commonly applied to our audiobooks (facilitated by this change) should enable users to conduct more sophisticates searches (for example, to find audiobooks that have both "love" and "French Revolution" as keywords, to take a purely hypothetical example).

Included in this change is code which I propose should be run as a cron job (to update keyword usage statistics) and a call to a quick-and-dirty update function that will allow the keyword usage stats for a newly catalogued project to show in whatever interval may exist before the next cron job will run,.

peterjdann commented 3 months ago

I am an almost complete newbie at using GitHub fork-and-clone, and must admit I have not the following idea what, if anything, I should be doing about the "All checks have failed" message I'm currently seeing for this pull request. Please advise!

redrun45 commented 3 months ago

The problem

I believe the problem is: while your new code is in the pull request, the modifications you made to the database are not.

Looking at the log ('Details' link on our "Deploy localdev (PR)" check task), it seems that when GitHub makes a test copy of the web site using your new code, that test copy returns a '500' internal server error message. I pulled up a copy myself, and here's what I see when I access a given page:

An uncaught Exception was encountered Type: mysqli_sql_exception Message: Unknown column 'keywords.instances' in 'field list' Filename: /librivox/www/librivox.org/catalog/system/database/drivers/mysqli/mysqli_driver.php Line Number: 307

As for what to do about it, this is new to me too. I've never had to update the database schema in tandem with a code change.

What I'd suggest, is to first take a snapshot of your current state, then re-run Ansible to reset everything (including the database) to the shared version we're working off of. Then see what it takes to get things working from there.

Housekeeping

You'll notice that our 'librivox/master' version of the code has been updated since you started, so you can go to your "fork" page and you'll have an option to merge those changes into your fork. The line that should have this button for you, currently shows me:

This branch is 14 commits ahead of, 6 commits behind LibriVox/librivox-catalog:master.

You'll want to merge those 6 commits into your fork, and then go back to your dev machine to git pull the resulting, updated branch.

Next steps (Updated per Artom's post)

Now, with the database reset and your code branch up to date, you'll be seeing the error code that I'm seeing. From there, you can create a database migration file, with all the steps necessary to get the database working with your code.

notartom commented 3 months ago

https://github.com/LibriVox/librivox-catalog/pull/213 is the correct "industry standard" way to do these things. If that passes the checks, I'll merge it, then propose a patch to librivox-ansible to run the DB migrate command on every deployment, and from then on, PRs will be able to add their schema changes inline.

notartom commented 3 months ago

OK, @peterjdann you should be good to create our first migration under application/migrations, see https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration for an example. That should not only make our CI pass, but also create a record in-code in-tree of the schema changes required for your PR.

peterjdann commented 3 months ago

Since we last corresponded, I have rebased the development branch in my local environment so that it now includes, I believe, all the latest changes present in the master at this site. This involved resolving a couple of conflicts flagged in the rebase operation. I have also had a crack at creating the kind of configuration settings and code which I believe are required automatically to migrate a database schema change in this environment, working off the documentation at https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration and https://codeigniter.com/userguide3/database/forge.html#adding-a-column-to-a-table.

notartom commented 3 months ago

Playing with this PR locally, I kept getting this error:

The migration class "Migration_Add_keywords_instances_field" is missing an "up" method.

Looks like there's a bug with CodeIgniter itself: https://stackoverflow.com/questions/65405139/no-working-up-method-in-migrate-codeigniter-3-with-php8

In system/libraries/Migration.php, the bit where it checks for the presence of the up() method looks like this:

elseif ( ! is_callable(array($class, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}

However, per https://www.php.net/manual/en/function.is-callable.php, is_callable()'s first argument is an object, not a string name of a class, so we should be doing something like:

$instance = new $class();
<snip>
elseif ( ! is_callable(array($instance, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}
peterjdann commented 3 months ago

I had just found that Stackoverflow post when you posted this and was looking at the code concerned.

I am not confident I understand where or how to implement the line you describe as “$instance - new $class();

Would you be willing to put that through as a change in the master which I can then rebase from?

Edit: All good. It's pretty obvious how to patch this. I've done it now as you suggested.

Peter

On 2 Apr 2024, at 3:36 pm, Artom Lifshitz @.***> wrote:

Playing with this PR locally, I kept getting this error:

The migration class "Migration_Add_keywords_instances_field" is missing an "up" method. Looks like there's a bug with CodeIgniter itself: https://stackoverflow.com/questions/65405139/no-working-up-method-in-migrate-codeigniter-3-with-php8

In system/libraries/Migration.php, the bit where it checks for the presence of the up() method looks like this:

elseif ( ! is_callable(array($class, $method))) { $this->_error_string = sprintf($this->lang->line('migrationmissing'.$method.'_method'), $class); return FALSE; } However, per https://www.php.net/manual/en/function.is-callable.php, is_callable()'s first argument is an object, not a string name of a class, so we should be doing something like:

$instance = new $class();

elseif ( ! is_callable(array($instance, $method))) { $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class); return FALSE; } — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you were mentioned.
redrun45 commented 3 months ago

The silence is not due to a lack of enthusiasm on my part - I'm afraid we've got outages in my neck of the woods. I'll have a bit of a backlog when my PC is back online. 😅

peterjdann commented 3 months ago

Not a problem in the least. I’m not expecting any instantaneous response at all, and am only sorry you’re having to wade through such difficulties. All in good time.

On 4 Apr 2024, at 11:22 AM, redrun45 @.***> wrote:

The silence is not due to a lack of enthusiasm on my part - I'm afraid we've got outages in my neck of the woods. I'll have a bit of a backlog when my PC is back online. 😅

— Reply to this email directly, view it on GitHub https://github.com/LibriVox/librivox-catalog/pull/212#issuecomment-2035845714, or unsubscribe https://github.com/notifications/unsubscribe-auth/APUJ53B5YH6C6GN6ZTEBNCLY3SMMDAVCNFSM6AAAAABFREOGHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHA2DKNZRGQ. You are receiving this because you were mentioned.

notartom commented 3 months ago

This is a really big PR (naturally), so reviewing it as a whole is difficult. In cases like this, I normally fall back on reviewing one commit at a time (really, at my day job, one commit at a time is the only way we review code). The assumption is, every commit can stand on its own, adding small bits of the overall feature; and every commit builds on the previous one, until the last one in the PR ties it all together. Splitting a feature into such commits is an art/science in and of itself.

I don't think it's fair to arbitrarily hold volunteer LibriVox contributors to the same standards as my day job colleagues, but I do think that it's valuable to promote (insist on?) quality maintainable and readable code, with a clean git history.

I think I can guess how you got to the commit history in this PR: you started with a POC of your feature, then kept adding/changing things until you beat it into shape, and ended up with the final result you had initially designed. The problem that this introduces is that subsequent commits end up undoing much of what previous commits had done. That makes it hard to review, and muddies the git log history.

Do you think you'd be able to take the final result, and starting from this end state, split it into commits that build the feature from the ground up? The usual way of doing these things is to start with the database/model changes at the bottom, then the controllers, then the views - at least, that's what I would do, keeping in mind I'm not an expert in web stuff.

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

peterjdann commented 3 months ago

Artom, you’re quite right in your assumptions about my method of development here.

I imagine the easiest (ha!) way for me to do what you are suggesting is:

Is that correct?

I’d much rather you be over the moon than under it, so I’m certainly willing to have a go at what you’re suggesting. I do want to make sure I understand what you’re suggesting, though.

On 4 Apr 2024, at 12:03 PM, Artom Lifshitz @.***> wrote:

This is a really big PR (naturally), so reviewing it as a whole is difficult. In cases like this, I normally fall back on reviewing one commit at a time (really, at my day job, one commit at a time is the only way we review code). The assumption is, every commit can stand on its own, adding small bits of the overall feature; and every commit builds on the previous one, until the last one in the PR ties it all together. Splitting a feature into such commits is an art/science in and of itself.

I don't think it's fair to arbitrarily hold volunteer LibriVox contributors to the same standards as my day job colleagues, but I do think that it's valuable to promote (insist on?) quality maintainable and readable code, with a clean git history.

I think I can guess how you got to the commit history in this PR: you started with a POC of your feature, then kept adding/changing things until you beat it into shape, and ended up with the final result you had initially designed. The problem that this introduces is that subsequent commits end up undoing much of what previous commits had done. That makes it hard to review, and muddies the git log history.

Do you think you'd be able to take the final result, and starting from this end state, split it into commits that build the feature from the ground up? The usual way of doing these things is to start with the database/model changes at the bottom, then the controllers, then the views - at least, that's what I would do, keeping in mind I'm not an expert in web stuff.

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

— Reply to this email directly, view it on GitHub https://github.com/LibriVox/librivox-catalog/pull/212#issuecomment-2035894754, or unsubscribe https://github.com/notifications/unsubscribe-auth/APUJ53GRZJHVK4NNUIXRMRLY3SRHNAVCNFSM6AAAAABFREOGHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHA4TINZVGQ. You are receiving this because you were mentioned.

notartom commented 3 months ago

That's pretty spot on - git has various commands that can help, so let me know if you need pointers.

peterjdann commented 3 months ago

I am concerned that I am pursuing an ever-receding mirage here. No sooner than I have got past one set of difficulties than I find myself faced with still further challenges, not one of which I had the faintest inkling of at the outset.

I should say, by way of background, that I worked for many years as the support manager of a company that developed and sold an SAP change management solution to many major companies around the world, including, by the time I finished up, Shell, CocaCola, Apple, IBM and pharmaceutical GSK. We tested every release of our product against a defined set of agreed high-criticality and moderate-criticality capabilities, but we never performed a single 'unit test', and only ever tested the complete product, even though much of the code underlying an SAP system can readily be envisaged as a series of 'units' with defined inputs and outputs.

I am now being asked to create a set of tests for models, controllers and views in an environment where, as far as I can see, with only one possible exception, only the most trivial set of model tests currently exists. (If I'm wrong, I apologise, and do please correct me, but that appears to me to be the situation at the moment.) I am also very concerned that it may prove difficult to provide meaningful, non-trivial tests of the functions I have developed without having a known, frozen database to work off — and it's not clear to me at all that the anonymised database I am currently working off is, in any sense, 'frozen'. For all I know, the keywords data in different instances of this development database may be in a more or less constant state of flux, and may be quite different by the time these tests come to be run on a machine other than my own. In short, I wonder what the possible lifespan of any tests I do develop may be, and if they will even still be valid by the time I get all finally created (which, incidentally, is looking like the middle of 2025 at the moment, from where I sit).

I can see, from the test documentation I have managed to peruse so far, that there are all sorts of way of mocking up data sources for 'unit testing' purposes, but the approaches I have read about (eg, writing pretend DB data into an array), frankly strike me as idiotic in the context in which we're working here, where what really counts (no pun intended) is whether the results of SQL queries look realistic.

If you do insist on my developing a full set of unit tests developed to go with a new, git-sanitised version of this pull request, can you please point me to example code for model, controller and view unit tests developed under the framework(s) we're using here that work with realistic-looking data? I don't see any under either PHPUnit documentation or ci-phpunit-test documentation that strike me as fitting that bill.

I am willing to give this a go if I can find some reasonable guidance, but right now nothing that I'm looking at by way of examples strikes me as any more appropriate a way of testing the change I'm proposing than simply installing it, playing with it for twenty or so minutes, then doing your darnedest to try to break it.

redrun45 commented 3 months ago

@peterjdann, Thanks for your efforts so far. Sounds like this has gotten to feel like a rabbit hole, but I think a big chunk of that is a mismatch of expectations.

We do want to look at your final code in a more modular fashion, rather than either tracking the history of decisions across the versions, or looking at all of the changes at once.

We do not expect, or even ask, that you mock up controllers and models and a fake database for testing everything. It would be pretty cool if we had a complete set of models for unit tests, but we only recently started using any! You're not on the hook to create the whole set of mock models from scratch. Far from it.

The actual request here is much more pragmatic:

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

If you happen to add some code that is purely "functional" in the sense that its output naturally depends entirely on its input, not on any database queries, then it would be nice if you added a few test-cases for it. Stuff like this "helper" program, which could be tested without too much fuss: https://github.com/LibriVox/librivox-catalog/pull/189/files https://github.com/LibriVox/librivox-catalog/pull/192/files

Absolutely no database or model mockups required, requested or expected!

notartom commented 3 months ago

Yep, very much what @redrun45 said. Looking at the code in the PR again, honestly nothing jumps out as being super easy to write a unit test for. Anything that does SQL stuff is out of the question. Some of the other get_<foo>-type methods could be candidates, but it looks like mocking/stubbing a bunch of model stuff would be necessary for those.

Really, the unit tests are a bonus. I'd like to see us move in that direction, but given how much of a mess the existing code base is, it's hard not to feel like a hypocrite when asking for unit tests for new stuff.

The more important part is a cleaner sequence of commits. I just don't want to have a commit that adds method X, only for the subsequent commit to remove it and replace it with Y and Z.

peterjdann commented 3 months ago

Thank you both. I've clearly succumbed to a bad case of not reading a suggestion that's actually laid out with perfect clarity, and for that I must apologise. And I can understand completely that what you're asking for makes sense. As it happens, right now I've got some other work on my hands of an urgent nature that is probably going to take me around 5 weeks to complete. Once I've got that done, I'll be in a good position to give this my undivided attention.

redrun45 commented 1 month ago

This one continues as a new PR, #225. 🎉