WordPoints / hooks-api

A basic API for hooking into user actions https://github.com/WordPoints/wordpoints/issues/321
GNU General Public License v2.0
0 stars 0 forks source link

Listen for all CRUD MySQL queries #120

Open JDGrimes opened 8 years ago

JDGrimes commented 8 years ago

In #118 we've considered doing this using the action API. However, I also wanted to explore doing this at a much lower level instead. So this ticket is to explore that. Ultimately what we have in mind is #119.

JDGrimes commented 8 years ago

MySQL Triggers

What if instead we listened for all changes to entities at the lowest level possible: MySQL itself, via triggers. This is something that we hadn't considered before, probably because it seems to desperate and extreme. However, it might not be entirely without merit, so let's at the very least explore it a bit.

At the very least, we can say that this wouldn't require us to parse queries. We could set up AFTER INSERT ROW and AFTER UPDATE ROW triggers, for example, and have these save the information to our own custom database table, which would basically act as a queue. Then we'd have to check the table for changes every page load or something, and delete the data after processing.

The biggest issue, beside the fact that plugins generally don't do things at that low level, is whether this would really work across all sites...

After considering the docs, I've discovered that it MySQL 5.0.10 is required for triggers to be able to access and modify other tables beside the table that the trigger is on. So we'd need at least that. Fortunately, WordPress requires MySQL 5.0+, but that doesn't guarantee that everyone will be using the latest version of MySQL 5.0, I guess (although this page says 5.0.15 is required for WordPress 3.2+). I'm guessing that we'd likely be able to squeak by on that, because currently only 5.1% (and dwindling) of WordPress sites run on MySQL 5.0 anyway.

The bigger issue is that it turns out that MySQL 5.7 is required before multiple triggers can be created for the same action on a single table. So if someone is using some custom triggers of their own (not likely, but who knows) we'd be unable to add our triggers at all on that particular table. MySQL 5.7 currenlty enjoys a whopping 0.2% support among WP sites, so I don't think that plugins and triggers are really going to mix well any time soon.

JDGrimes commented 8 years ago

Parsing $wpdb Queries

That's why we had (briefly, and admittedly not at all seriously) entertained the idea of parsing the MySQL queries at a higher level instead, like the query filter of $wpdb.

The above discussion had me thinking back on that idea, and I realized upon further thought that it actually wouldn't be much trouble to parse most update/delete/insert queries. They're generally relatively simple, and don't have some of the complexity offered by SELECT and other queries. So we might not need a full-scale SQL parser at all, just build our own basic parser for the types of queries that we need to parse.

Assuming that isn't just wishful thinking, the biggest problem would actually be that the only hook we'd have for that would be the query filter, which is called before the query is made, meaning that we wouldn't know whether or not the query actually succeed and we needed to do anything about it.

Searching for any past requests to have other hooks added to $wpdb, I found this ticket, which has been closed wontfix. There is a lot of good related information there, about transactional hooks, hacking $wpdb, considerations in regard to db drop-ins, etc. I had wondered what plugins like VersionPress do. It was mentioned in that ticket, and I took a look at the source code. Turns out VersionPress hacks the wp-db.php file, actually modifying wpdb itself. Obviously, we're definitely not doing that. However, the ticket does remind us that there are ways that we could replace $wpdb with our own wrapper function, though the fact of db drop-ins means that there are a few special considerations that must be taken. All things considered (which I guess actually we've yet to do, but...) it might not be too difficult, and it would in theory give us almost 100% support across entities and plugins.

The biggest problem with wrapping $wpdb is that then we'll end up triggering the API and running a bunch of queries before the result of that query is returned to whatever code was making the query, and processed. As a result, the state of the $wpdb object will have changed in the interim, because of the queries we've run. So, for example, the insert_id property will be reset.

There would be ways to work around this, like backing up those properties prior to running any of our queries, and then restoring them. The problem is that we don't know for sure which properties will even be there, because of db drop-ins which might have entirely different internals than wpdb. And also, I'm not sure that all of the properties that might be changed are public and can be backed up and restored anyway (or at least its unlikely that they would be in most drop-ins).

Possibly we could clone the object after making the query, and restore the untouched clone to be the global value after our queries were finished. But then I'm not sure that that is the best idea either, because then we'd have to keep a copy of every clone around, just in case the __destruct() method is defined and does something when the object is destroyed, something that we might not want it to do in this case. (And also, we don't even know for sure whether cloning the object would even work properly, or would have other unintended side-effects.)

But let's take stock of what we really would break here if we couldn't back everything up quite properly. The first thing that we might do is inadvertently overwrite any error info. However, that's not a problem because we wouldn't run any of our own queries when there is an error. (Though I guess it might be an issue if any of our queries errored.) Then there are some variables that persist from one get_*() query to the next, I think, but that isn't an issue either since we are ignoring all queries of that sort. Another possible issue would be overriding insert_id, as i said above. However, that's something that we can backup. It has to be public across drop-ins because it is accessed by code outside of wpdb.

All in all then, we can say that modifying state might not actually be much of a concern, because all of the wpdb/drop-in properties will fall into two classes:

  1. Those that are public. These must be public across drop-ins, because they are accessed by code outside of wpdb. They are also ones that we can backup and restore. Well, we can probably back them up, but if magic getters and setters are used we may not be able to restore them. Even wpdb has some protected properties that it makes publicly available for back-compat, but using a magic getter. But we don't need to worry about that too much, it would just mean that we might have to have a list of props to back-up instead of just using get_object_vars() alone. The issue of not also having a magic setter though, would mean that we couldn't restore at all, but I'm not sure it is really a deal-breaker, unless this is a common pattern in db drop-ins.
  2. Those that are internal. These cannot be backed up or restored, however I think that we can determine whether they would cause any problems in drop-ins by checking whether there are any that would cause problems in wpdb itself. Drop-ins cannot depend upon any cross-method-call behavior that wpdb itself does not depend upon, because otherwise it wouldn't be compatible across plugins.

But this still leaves us with one problem, and that is that the MySQL resource/state itself, and the result property, will be overwritten. The result property is protected, but can conceivably be accessed by code via the magic getter. We could back up and restore the property, I think, but we'd need to set it to null after backing it up to prevent flush() from freeing it. The issue is that even then, would that actually preserve the underlying result resource itself, or would new queries cause things to get crossed up? I guess that is something that we'd have to test.

It helps to just check the docs. myslqi_query() just returns true/false for the queries that we are interested in, so it is fine if we need to override result, as we'd be able to back up and restore it; its not going to be a resource. It's possible that drop-ins will handle the result differently, but that really isn't our problem since it would likely break any code that is cheating and accessing that property externally anyhow.

One thing in regard the result, we would still need to maybe call flush() or something, to free up our last result from the query, instead of just overwriting it on restore, because otherwise the final query that we made will never get freed and will just sit there cluttering up memory, and who knows what else. I guess it is possible that calling flush() would have unintended side affects in some drop-ins, but really that's probably less of a concern, and I think very unlikely at that.

However, just because the result itself can be backed up, we still will be changing the state of the dbh. But again, that's protected and technically folks shouldn't be using it (which means that they probably are in the most insane way). I'm not sure that there is much that they could actually do with it anyway, since there isn't really any info to get from it after an insert/update/delete query, since pretty much all of the info is already provided by $wpdb. And also, any code that accesses it is going to need to have abundant sanity checks anyway (if it would even be cross-compatible enough to be of value at all), because there's no guarantee that drop-ins will offer dbh at all, much less that it will be something that they an actually use for anything, or would know how to get info from it. And note that, even if such a thing did exist, it would have to affect code that modifies entities that we're specifically listening for, so it isn't something that is likely sneak up on us without our knowledge.

JDGrimes commented 8 years ago

Magic setters

So the main issue would be if missing magic setters are a common pattern, preventing us from restoring backed-up $wpdb properties. I've done a quick servey of some common db.php drop-ins and find out. Here's what I found from this very related blog post about VersionPress as well as a search for "wordpress db drop in" on GitHub, and other searches:

All of these extend wpdb, and none of them implement magic getters or setters that would cause problems—in fact, only WP DB Driver implements magic getters or setters at all, and those are just copies of wpdb's anyway.

I would note that WP DB Driver makes dbh and result private in some drivers, so I guess that we need to take care to check that result is actually available before we attempt to back it up (and really all of the properties, just in case).

In short, it doesn't seem like restoring the backed-up properties will be a problem at all.

JDGrimes commented 8 years ago

Queries using MySQL functions

The only real issue that I see at this point is that for those queries that we have to parse (which is only those that don't use $wpdb->(insert|update|delete|replace)), is that they might contain calls to MySQL functions like CURRENT_TIME() and such. This manly becomes a problem when these are part of the WHERE part of an UPDATE or DELETE query, because in other cases we can get the "after" values directly from the entity object. It is only for those queries that we wouldn't be able to get the "before" values for the affected entity(ies). Although this does have the potential to make parsing far more complex in any case.

Assuming that the added parsing complexity wouldn't be a problem, the main issue then is just that in those specific cases we'd not have the "before" values. Unless we find a work-around (like running SELECT $func, but that could probably be exploited), or else just skip queries that we can't parse to simple column => raw value arrays. I think the latter would be permissible, since this will probably be a rare case, and we can always fill in for it in by hooking into other actions where needed.

Or perhaps we'll just be reconstructing the WHERE clause into a SELECT query anyway, so that we can get the before values. In that case I suppose this point is mostly moot.

JDGrimes commented 8 years ago

Survey of queries that require parsing

How many queries would we really have to parse anyway? Perhaps we could safely skip parsing any of the MySQL queries entirely. I guess a survey of WordPress core would at least give us an idea.

Excluding schema.php and upgrade.php, we have:

INSERT

WP_Upgrader::create_lock():

$lock_result = $wpdb->query( $wpdb->prepare( "INSERT IGNORE INTO `$wpdb->options` ( `option_name`, `option_value`, `autoload` ) VALUES (%s, %s, 'no') /* LOCK */", $lock_option, time() ) );

add_option():

    $result = $wpdb->query( $wpdb->prepare( "INSERT INTO `$wpdb->options` (`option_name`, `option_value`, `autoload`) VALUES (%s, %s, %s) ON DUPLICATE KEY UPDATE `option_name` = VALUES(`option_name`), `option_value` = VALUES(`option_value`), `autoload` = VALUES(`autoload`)", $option, $serialized_value, $autoload ) );

_wp_upgrade_revisions_of_post():

    $result = $wpdb->query( $wpdb->prepare( "INSERT IGNORE INTO `$wpdb->options` (`option_name`, `option_value`, `autoload`) VALUES (%s, %s, 'no') /* LOCK */", $lock, $now ) );

wp_set_object_terms():

            if ( false === $wpdb->query( "INSERT INTO $wpdb->term_relationships (object_id, term_taxonomy_id, term_order) VALUES " . join( ',', $values ) . " ON DUPLICATE KEY UPDATE term_order = VALUES(term_order)" ) )

_wp_batch_split_terms():

    $lock_result = $wpdb->query( $wpdb->prepare( "INSERT IGNORE INTO `$wpdb->options` ( `option_name`, `option_value`, `autoload` ) VALUES (%s, %s, 'no') /* LOCK */", $lock_name, time() ) );

We don't actually have to parse the INSERT queries at all, since we can get the "after" data with a simple query, and there is no "before" data. Most of these are on the options table anyway, which we don't care about.

However, the INSERT query in wp_set_object_terms() includes an ON DUPLICATE KEY UPDATE clause. This means that it isn't always an insert query, sometimes it becomes an update. That is an added complexity that we hadn't considered above, and it means that we do need to get the "before" value from the query (at least we do in those cases when there will be a duplicate key, but it would likely be faster just to always assume this than to run an extra query to find out). This means that we do need to parse INSERT queries when they contain ON DUPLICATE KEY UPDATE. In the end though, I'm not sure that this particular query is actually that important to us, and at the very least it wouldn't be important until we introduce the term/taxonomy entities.

JDGrimes commented 8 years ago

Cont.

UPDATE

do_trackbacks():

$wpdb->query( $wpdb->prepare("UPDATE $wpdb->posts SET to_ping = TRIM(REPLACE(to_ping, %s, '')) WHERE ID = %d", $tb_ping, $post_id) );

trackback():

    $wpdb->query( $wpdb->prepare("UPDATE $wpdb->posts SET pinged = CONCAT(pinged, '\n', %s) WHERE ID = %d", $trackback_url, $ID) );
    return $wpdb->query( $wpdb->prepare("UPDATE $wpdb->posts SET to_ping = TRIM(REPLACE(to_ping, %s, '')) WHERE ID = %d", $trackback_url, $ID) );

wp_media_attach_action():

if ( 'attach' === $action ) {
    $result = $wpdb->query( $wpdb->prepare( "UPDATE $wpdb->posts SET post_parent = %d WHERE post_type = 'attachment' AND ID IN ( $ids_string )", $parent_id ) );
} else {
    $result = $wpdb->query( "UPDATE $wpdb->posts SET post_parent = 0 WHERE post_type = 'attachment' AND ID IN ( $ids_string )" );
}

remove_user_from_blog():

        if ( ! empty( $post_ids ) ) {
            $wpdb->query( $wpdb->prepare( "UPDATE $wpdb->posts SET post_author = %d WHERE post_author = %d", $reassign, $user_id ) );
            array_walk( $post_ids, 'clean_post_cache' );
        }

        if ( ! empty( $link_ids ) ) {
            $wpdb->query( $wpdb->prepare( "UPDATE $wpdb->links SET link_owner = %d WHERE link_owner = %d", $reassign, $user_id ) );
            array_walk( $link_ids, 'clean_bookmark_cache' );
        }

wp_untrash_post_comments():

        $wpdb->query( $wpdb->prepare( "UPDATE $wpdb->comments SET comment_approved = %s WHERE comment_ID IN ($comments_in)", $status ) );

user-edit.php:

            $wpdb->query( $wpdb->prepare( "UPDATE {$wpdb->signups} SET user_email = %s WHERE user_login = %s", $user->user_email, $current_user->user_login ) );

/* ... */

        $wpdb->query( $wpdb->prepare( "UPDATE {$wpdb->signups} SET user_email = %s WHERE user_login = %s", $_POST[ 'email' ], $user_login ) );

The trackback-related queries aren't important in my opinion, and I think we could safely ignore them, at least for now. We could always introduce support for trackbacks in the future if we wanted to, but at present I think it is a waste of time, especially since they are likely to be replace by webmentions eventually.

I'm not sure that the two queries affecting the signups table are important either, since that table is just an intermediary when a user is requesting a blog but hasn't been granted one yet.

The other three queries aren't necessarily vital, I guess, but they would be candidates for parsing. wp_untrash_post_comments() has an action that we could use instead of parsing the query, but remove_user_from_blog() and wp_media_attach_action() do not.

JDGrimes commented 8 years ago

Cont.

DELETE

delete_usermeta() (deprecated):

    if ( ! empty($meta_value) )
        $wpdb->query( $wpdb->prepare("DELETE FROM $wpdb->usermeta WHERE user_id = %d AND meta_key = %s AND meta_value = %s", $user_id, $meta_key, $meta_value) );
    else
        $wpdb->query( $wpdb->prepare("DELETE FROM $wpdb->usermeta WHERE user_id = %d AND meta_key = %s", $user_id, $meta_key) );

delete_metadata():

$query = "DELETE FROM $table WHERE $id_column IN( " . implode( ',', $meta_ids ) . " )";

wp_remove_object_terms():

        $deleted = $wpdb->query( $wpdb->prepare( "DELETE FROM $wpdb->term_relationships WHERE object_id = %d AND term_taxonomy_id IN ($in_tt_ids)", $object_id ) );

Because delete_usermeta() is deprecated, I think we could safely ignore it. These other two cases might want parsing, although neither would be necessary at the moment. wp_remove_object_terms() fires the delete_term_relationships hook just before running the query, meaning that we could hook in there rather than parsing the query if we chose. The delete_metadata() function likewise fires the delete_{$meta_type}_meta hook before running its query.

JDGrimes commented 8 years ago

Cont.

WordPoints

In WordPoints itself we also have (excluding the un/installers):

wordpoints_alter_points():

                    UPDATE {$wpdb->usermeta}
                    SET `meta_value` = GREATEST(`meta_value` + %d, %d)
                    WHERE `meta_key` = %s
                        AND `user_ID` = %d

We don't currently have an action that we could use to avoid trying to parse this, although of course we control the code so we could add one. Also, parsing the meta_value part actually isn't a problem because we can always just get an "after" picture of an update query by querying the database. It is only the WHERE clause that we'd really have to worry about parsing. In the end of course, this query isn't currently important for us to know about anyway.

JDGrimes commented 8 years ago

Cont.

Summary

Based on this survey we can conclude that there will occasionally be times that there are CRUD queries that use the $wpdb->query() method directly. Some of these will be simple, and have no true reason for not using the $wpdb CRUD methods instead. However, others will be doing this precisely because they are too complex to use these methods, which may make parsing tricky. Fortunately, some of these cases will be low-level, and not important for us. As for the remainder, in some cases actions may be available that we could utilize rather than query parsing, although in others that will not be the case. As far as core itself goes, these basically come down to remove_user_from_blog() and wp_media_attach_action(), although I have a feeling that this is going to be more prevalent in some plugins.

I think that we could actually safely not do parsing as far as core is concerned, since those functions aren't absolutely vital. However, we'll probably want to introduce parsing because of plugins not very far in the future, so it might make the most sense to just go ahead and introduce it now anyway.

Another thing to consider is that if we mix listening for actions with query parsing, we have to have some way of knowing which queries to parse and which ones will be handled elsewhere. I don't know that there will be any simple way to do that, so perhaps the two can't really be mixed at all. Mixing isn't a problem if our parser can't even handle the queries concerned anyway, but if we expand the capabilities of our parser in the future we'd still have this problem.

So again, it seems like if we are going to do parsing, it might be easiest to go all-in right from the start.

JDGrimes commented 8 years ago

REPLACE Queries

I've only ever had a foggy notion of exactly what a REPLACE query is. Those docs indicate that it is basically just an INSERT that deletes the previous row when there is a match on a primary/unique key. So basically it is an insert with a conditional delete. I'm not sure whether it makes most sense for us to handle this similar to an update, or just stick with what it actually is, and treat it as an insert + delete operation. Of course, when there is no existing row, we'd just handle it as an insert in any case. I'm not sure how we'd determine exactly whether a deletion was going to occur though, without knowing all of the unique/primary keys of the table. This all seems rather complex, and I wonder whether we should even pursue it at this time, since the replace() method is unused in core.

I did a search of GitHub to see whether any plugins use $wpdb->replace(). There are about 1300 code results, and they seem to all be pretty accurate. So in other words, there are literally hundreds of uses of the method. At first this seems rather damning, however, upon closer inspection it turns out that many of the results are from repositories that include the wysija-newsletters plugin, which uses replace() in the core/model.php file. However, I'm not actually sure that the particular method in question is ever used. This accounts for over 160 results. However, even when we exclude that and forks, we end up with more than 300 results (probably quite a bit more, but it doesn't tell you the true total when the search is that complex). So I'm thinking that this is surely going to be something that we'll want to implement for plugins, sooner or later. In many cases uses of replace will be lower-level code, however, I think that there may be times that we do need to listen for replace().

So while it may not be vital to ship the initial version with support for replace, we'll at least need to make sure that we build things in a manner that will allow us to add this in the future.

Of course, we could also just attempt to fill in for replace with actions, assuming that actions were available, although this would have the same issues as discussed above.

JDGrimes commented 8 years ago

Cont.

As far as how to handle REPLACE queries, as an insert + delete or just an update, I think that maybe we'll want to determine that on a case-by-case basis (though I'm not sure how we'd do that, exactly). Anyway, if we parse INSERT ... ON DUPLICATE KEY UPDATE queries in the future, as suggested above, we'd likely need to determine the table keys for that. So we'll probably end up having to have some code that does that, one way or another.

JDGrimes commented 8 years ago

Tokenizing Queries

If/when we do attempt to parse queries, I'm guessing that we'll likely need to tokenize them in some way. Searching for a PHP tokenizing library, I came across the strtok() function, which tokenizes a string by splitting on one or more characters. This is the sort of thing that I originally had in mind, but I was searching to see if there might be a more elegant solution out there already for tokenizing strings in PHP. If we do use strtok(), note that we should heed this note about cleaning up memory afterwards.

I also found several tokenizers built on PREG:

I also found this blog post that suggests just using token_get_all(). This actually might be a viable basic solution for tokenizing SQL, although of course much post-processing will be needed to convert the PHP-tokenized query into a set of tokens for SQL. But that might provide a useful starting point at least.

Of course, we could always use/build off of one of the existing SQL parsers. I was just thinking of creating my own simple tokenizer, since we only need to parser a specific sub-set of MySQL, not just any sort of query.

JDGrimes commented 8 years ago

Transactions and Rollbacks

What about when we are in a transaction, and the changes are later rolled back and not committed? Perhaps then we should wait to trigger the entity change API until those changes are actually committed. On the other hand, if a rollback occurs, any actions performed by an entity change listener API will be rolled back as well. In some ways that might actually be more robust, in the case that actions performed by the entity change listeners are taken into account for later things that occur during the transaction. The only thing would be that if anything outside the database was modified by the entity change listener API, it wouldn't automatically be rolled back if the commit is aborted. At first that doesn't sound like a problem, but you never know what we might want to do with entity change listeners in the future, like maybe even hitting external APIs or something. So perhaps waiting until the transaction is committed would be the best course of action. Either that or I suppose that we'd have to build transactions and rollbacks into the API, but that seems overkill for the frequency with which rollbacks are likely to be used.

And how frequently are ROLLBACK queries likely to be used anyway? A search of GitHub seems to indicate that this is not entirely uncommon (several thousand results), but probably somewhere around 70%+ of the results are duplicates or for test code, etc. A search of just the wp-plugins mirror yields only about 10-30 results, which seems more likely. I do knot that WooCommerce does (or at least did in the recent past) use transactions when creating an order.

Of course many of the uses of transactions are likely to be lower-level, but it seems that we'll inevitably be dealing with them in some plugin sooner or later. So we need to decide up front how to handle that, either just doing things as we normally would regardless of whether we are in a transaction, or else holding all of the queries over until the end.

Or, perhaps we could make this optional on a per-listener basis, though that would likely also add complexity. But it does make some sense, since if a listener is hitting a remote API, it will probably not be affecting later stuff that might get modified in the transaction anyway, so there is no reason not to wait until commit to trigger it.

But how would we even know whether we were in a transaction anyway? I guess we'd have to listen for START TRANSACTION, COMMIT, and ROLLBACK queries, and keep track. Actually, it is more complex than that, since there are a raft of modifiers and alternate syntax. (Would we also have to listen for statements that cause an implicit commit, or is that and edge case?)

But a problem, if we just run happily along ignoring that a transaction is happening, is that MyISAM, which is the default storage engine used by WordPress, doesn't support transactions. So, many changes that we made actually wouldn't be rolled back if the transaction was, since they'd be for data in tables that don't support transactions. (I'm assuming that modifying data in tables that don't support transactions is possible and doesn't affect the transaction, though I haven't been able to find any docs on this.)

Actually, WordPress doesn't explicitly set the storage engine for the tables that it creates. The storage engine used for its tables is therefore determined by the default storage engine of the server. So, unless a plugin creates tables with an explicit engine set, we don't know for sure what storage engine any particular table will use. So what entities will be subject to transactions will depend on the particular site.

So it seems that the sanest path forward will be to only trigger the entity change API once the transaction was actually committed.

Or, we could take the path of listening for the transactions and just ignoring any changes that occurred during them, leaving it to actions to fill in, although of course that has the same old issues if we expand our capabilities to includes transaction queries in the future. Also, there might be queries that occur during a transaction that normally we would parse, but just happen not to in this case, and so there would be no way to register actions for them, or the handling from the actions would have to be conditional on whether a transaction was active or not. So it would make more sense just to handle the transaction queries one way or another ourselves. Then again, generally speaking it is unlikely for there to be many "regular" changes during transactions, because most other entities won't be guaranteed to be subject to rollbacks, since they aren't stored in tables that explicitly support transactions. So if such entities were being modified during transactions, this would likely be a bug in the code that was managing the transactions.

So, if we are going to listen for transactions, it makes sense that we'd go ahead and handle queries that took place during them within our API, by holding them over until commit. Though due to the complexity of this, I wonder whether it is worth pursuing for the initial version, or whether we should wait until we really need it.

JDGrimes commented 8 years ago

Modification of entity relationships after the "before" snapshot?

The basic idea here is that we get a "before" snapshot of an entity before the query is run, and an "after" snapshot after the query is run. For insert queries of course the "before" is empty, and the "after" is empty when an entity is deleted.

A concern that I had is that during an update or delete query, the "before" snapshot wouldn't include all of the relationships and their data, only the attributes of the entity object itself. The fear was that as a result, sometimes the other data not on the main entity object would have been modified, but being uncached would produce the new values even for the "before" snapshot.

However, this concern is not warranted, because only values on the main entity object could have been modified by the query. Modifying any of the other values would be done by separate queries, and handled separately. And of course, it would be impossible for any such queries to occur between taking our before and after snapshots, since all of that takes place within the $wpdb method wrapper.

JDGrimes commented 8 years ago

Changes to Entity Relationships

This does give us pause to consider how exactly we should handle relationships. Usually entities will be related via an attribute on one of the entities, in which case the entity itself will be modified to modify the relationship. And in that case, our handling for entity attribute changes will work perfectly.

However, relationships do not have to be defined on attributes, they can be stored anywhere, like metadata or custom tables. In that case, how do we identify the change in a non-entity table as a modification of an entity relationship?

For the entity tables, the idea I had was that we'd attempt to determine the entity slug from the table name, though that probably wouldn't always work. So perhaps what we need is a map of table names to entity/entity child slugs. But I'm not sure that it is as simple as mapping the non-attribute entity children to table names, because we also have to know what column holds the entity relationship ID. And as far as metadata goes, we need not only the table name but the key column value as well to identify the relationship affected.

So basically what we need is to know how each entity and external entity child is stored, and ideally have this information in a format that is queryable by table name.

This is information that we had thought to store on the entities themselves for our query API, but had decided not to add yet. However, perhaps we should go ahead and do that, in essence adding a storage info API to the entity API. Probably we'd store this information on each entity object, though perhaps that wouldn't actually be necessary. Even if we did need to do that, we could likely cache the table map that we need, and only rebuild it when new entities are registered.

JDGrimes commented 8 years ago

Multi-row INSERT queries

I happened to think that it is possible for INSERT queries to insert multiple values at once. This isn't possible with $wpdb->insert(), of course, but for insert queries performed directly through $wpdb->query() it could be done. I'm not sure if we should try to handle this or not, but if nothing else it makes parsing more complex.

I'm not sure that we really could handle it robustly though, because usually we get the ID of the inserted entity via $wpdb->insert_id. However, when multiple entities are being inserted, only the ID of the first inserted row is supplied by MySQL.

We could just auto-increment the ID values ourselves, assuming that we know which column/attribute needs to be auto-incremented. Although we might want also to take into account the edge-case when the values for that column are actually supplied in the query, in which case the whole point is moot.

Usually these would tend to be low-level queries though, since any code relating to them would tend to have the same issue of not being able to get the IDs, except in the case when they could be inferred because of a newly created table (which again in itself indicates a lower-level query, and something we'd probably do better to handle retroactively anyway).

Perhaps it would make sense to hold of on supporting these then, until we think that we really need it. Though of course that brings us back to the old issue of trouble when we introduce this in the future if we have been using actions as a workaround. Although in this case, as noted above, there are unlikely to be any actions that we could use since the IDs would be unknown to the related code, so they'd be unable to fire actions with them.

JDGrimes commented 8 years ago

Related: https://github.com/WordPoints/hooks-api/issues/124#issuecomment-210901432