I've got the async stuff running on a production site and have hit a problem.
The async routine works like this:
a. During each batch, send digests for as many users as possible before running out of memory/time
b. This is done by querying, one at a time, for get_users_with_pending_digest( $type, $timestamp ). $timestamp ensures that we are only looking for items that were posted before the digest routine began.
c. When a user's digest is processed, the processed items are deleted from the queued_items table. This way, the next time get_users_with_pending_digest() is run, it won't find the user, and will instead find the next one with pending items.
If queued-item deletion fails for some reason in step c, then the next time the system goes back to fetch a pending user (step b) it will find the same user again, and again, and again, and again, and again, and you get the picture.
Queued items are deleted from the queued_items table when they're included in a successful digest. There are several reasons why this might not happen; see bpges_generate_digest():
The activity query doesn't find an item corrseponding to the activity ID. This, in turn, can happen for a few reasons:
i. Bugs. See #144, which has been fixed, but there could be others like it.
ii. Filters on BP queries (inside bp_activity_get() etc)
iii. The activity item has been deleted since the item was added to the digest queue.
The activity item was deemed "not valid for digest". See bp_ges_activity_is_valid_for_digest(). By default, this happens only when the activity item is too old, but it can also happen by filter.
The specifics of these situations are different, but BPGES must be more failure-tolerant in each of them. There are a couple strategies.
One is to be more aggressive about pruning items from the queued_items table. But this runs afoul of #129 and installations that intentionally suppress certain activity items for certain digest runs, but don't want them deleted because they'll include them later on.
We could add a filter to get_user_with_pending_digest() or something - that is, step b above - and make it incumbent on plugin authors to filter out invalid "pending" users at the time of query, rather than at the time of digest-assembly. But this could mean double processing (you might have to pull up all queued items to see whether the user should get a digest on a specific occasion), and it's not backward-compatible.
We could have some sort of flag that says "this user has been processed for this digest run", and use it to ensure that we don't get duplicates during a given run. The flag would be set regardless of whether a digest was actually sent for the user. So, something like this:
// get_users_with_pending_digest()
// can't do subquery or join because of situations where global table is on different host
$meta_key = "bpges_digest_processed_{$type}_{$timestamp}";
$received_user_ids_raw = $wpdb->get_col( $wpdb->prepare( "SELECT user_id FROM {$wpdb->usermeta} WHERE meta_key = %s" ) );
$received_user_ids = implode( ',', array_map( 'intval', $received_user_ids_raw ) );
$user_ids = $wpdb->get_col( $wpdb->prepare( "SELECT DISTINCT user_id FROM {$table_name} WHERE type = %s AND date_recorded < %s AND user_id NOT IN ({$received_user_ids}) LIMIT %d", $type, $timestamp, $count ) );
// handle_digest_queue()
$user_id = BPGES_Queued_Item_Query::get_user_with_pending_digest( $type, $timestamp );
$meta_key = "bpges_digest_processed_{$type}_{$timestamp}";
bp_update_user_meta( $user_id, $meta_key, 1 );
// Then at the end of the run I could clear out all the usermeta.
I think that this last solution is the most viable one, though I'd welcome feedback, especially from @modelm and others who deal with complex situations like this. For example, is it best to use bp_update_user_meta() here, given that you might want to query this information in a multinetwork environment?
I've got the async stuff running on a production site and have hit a problem.
The async routine works like this:
a. During each batch, send digests for as many users as possible before running out of memory/time b. This is done by querying, one at a time, for
get_users_with_pending_digest( $type, $timestamp )
.$timestamp
ensures that we are only looking for items that were posted before the digest routine began. c. When a user's digest is processed, the processed items are deleted from thequeued_items
table. This way, the next timeget_users_with_pending_digest()
is run, it won't find the user, and will instead find the next one with pending items.If queued-item deletion fails for some reason in step c, then the next time the system goes back to fetch a pending user (step b) it will find the same user again, and again, and again, and again, and again, and you get the picture.
Queued items are deleted from the
queued_items
table when they're included in a successful digest. There are several reasons why this might not happen; seebpges_generate_digest()
:bp_activity_get()
etc) iii. The activity item has been deleted since the item was added to the digest queue.bp_ges_activity_is_valid_for_digest()
. By default, this happens only when the activity item is too old, but it can also happen by filter.The specifics of these situations are different, but BPGES must be more failure-tolerant in each of them. There are a couple strategies.
queued_items
table. But this runs afoul of #129 and installations that intentionally suppress certain activity items for certain digest runs, but don't want them deleted because they'll include them later on.get_user_with_pending_digest()
or something - that is, step b above - and make it incumbent on plugin authors to filter out invalid "pending" users at the time of query, rather than at the time of digest-assembly. But this could mean double processing (you might have to pull up all queued items to see whether the user should get a digest on a specific occasion), and it's not backward-compatible.I think that this last solution is the most viable one, though I'd welcome feedback, especially from @modelm and others who deal with complex situations like this. For example, is it best to use
bp_update_user_meta()
here, given that you might want to query this information in a multinetwork environment?