Closed boonebgorges closed 6 years ago
That would be a great addition. Thanks to your client for funding it. :)
On Tue, Dec 5, 2017 at 12:30 PM, Boone Gorges notifications@github.com wrote:
I've got to build this for a client, and it seems like an obvious fit for Docs, so let's roll it in.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/boonebgorges/buddypress-docs/issues/592, or mute the thread https://github.com/notifications/unsubscribe-auth/ABU9ev9vpgjwwDTrN6ZxAN4-pQRIxOJ8ks5s9YvagaJpZM4Q2wQP .
@dcavins This ended up being trickier than I'd thought, but I think I have the logic right. If you feel like taking a look at the changeset, the sanity check wouldn't hurt.
Also, FYI, I've started a 2.1.x branch for major work :-D
For anyone who cares, here's an mu-plugin that will enable for older Docs: https://gist.github.com/boonebgorges/547b49e87acbe9d2af18812a1b6dcf04
This was interesting to look into, because the _filter_query_attachments_filename
filter was new to me. I'm not sure it's needed in this case, though. Here are the queries I'm seeing on the docs main directory:
## With filter and search (returns expected results):
(This is the archive's main query that I think we steamroll later.)
Q1: SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')))
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
ORDER BY wp_posts.post_title LIKE '%anchor%' DESC, wp_posts.post_date DESC
LIMIT 0, 10
Q2: SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')))
AND wp_posts.post_type = 'attachment'
AND ((wp_posts.post_status = 'inherit'))
ORDER BY wp_posts.post_date DESC
LIMIT 0, 5
Q3: SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
LEFT JOIN wp_postmeta AS sq1
ON ( wp_posts.ID = sq1.post_id
AND sq1.meta_key = '_wp_attached_file' )
WHERE 1=1
AND ( wp_posts.ID IN (25,1)
OR (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')
OR ( sq1.meta_value LIKE '%anchor%' ))) )
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_modified DESC
LIMIT 0, 10
## Without add_filter( 'posts_clauses', '_filter_query_attachment_filenames' ) filter (also returns expected results on the directory, but phpunit test fails for some reason):
(Matches Q1 above)
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')))
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
ORDER BY wp_posts.post_title LIKE '%anchor%' DESC, wp_posts.post_date DESC
LIMIT 0, 10
(Matches Q2 above)
SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')))
AND wp_posts.post_type = 'attachment'
AND ((wp_posts.post_status = 'inherit'))
ORDER BY wp_posts.post_date DESC
LIMIT 0, 5
(Removes the postmeta LEFT JOIN from above)
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
WHERE 1=1
AND ( wp_posts.ID IN (25,1)
OR (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%'))) )
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
ORDER BY wp_posts.post_modified DESC
LIMIT 0, 10
## With filter but no pre-search (doesn't find docs with matching attachments):
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')))
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
ORDER BY wp_posts.post_title LIKE '%anchor%' DESC, wp_posts.post_date DESC
LIMIT 0, 10
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
LEFT JOIN wp_postmeta AS sq1
ON ( wp_posts.ID = sq1.post_id
AND sq1.meta_key = '_wp_attached_file' )
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchor%')
OR (wp_posts.post_excerpt LIKE '%anchor%')
OR (wp_posts.post_content LIKE '%anchor%')
OR ( sq1.meta_value LIKE '%anchor%' )))
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_modified DESC
LIMIT 0, 10
Looking at the pieces, the _filter_query_attachments_filename
added bit is attempting to add the actual attachment IDs to the results (but we're specifying the post_type, so it will always be nullified). When I remove that filter, the directory search works as expected, but the phpunit test fails. Any idea what's going on?
Also, I think that the query limit will have to be overridden in the new get_posts()
pre-fetch call by adding a 'numberposts' => -1
argument.
This is a really nice improvement to the search results. Now, if only we had a relevance engine to use... :)
Here's to 2.1!
_filter_query_attachments_filename
allows us to run a LIKE
query on postmeta. This otherwise is not possible; WP_Meta_Query
doesn't support it.
When you disable that filter, the query succeeds because of a default behavior of WP: putting the basename
of the attachment file in the post_title
field of the 'attachment' post item. So wp_posts.post_title LIKE '%anchor%'
results in a hit. The test is written in such a way that the filename is only put into _wp_attachment_filename
postmeta, which is why it fails.
If we rely solely on what's in post_title
, we'll miss search queries like foo.pdf
.
Looking at the pieces, the _filter_query_attachments_filename added bit is attempting to add the actual attachment IDs to the results (but we're specifying the post_type, so it will always be nullified).
No - we're grabbing the post_parent
, which should be a bp_doc
. In cases where it's not, it'll be excluded by the post_type
clause.
Also, I think that the query limit will have to be overridden in the new get_posts() pre-fetch call by adding a 'numberposts' => -1 argument.
Yes - I'll make that change.
Looking more closely, I think I also introduced a small bug because of the default suppress_filters
value of get_posts()
. I'll correct, and then you can have another look if you'd like.
I should note that I am concerned about performance when we do the following: https://github.com/boonebgorges/buddypress-docs/blob/5ba8c282814dfa5da6ba706bc4f3bca68be6c25a/includes/query-builder.php#L226 Fetching the entire post objects is not going to be pretty at scale. This might be a case where we want to either switch to a direct query, or do a fields=ids
query and then do a direct query to get post_parent
values. Do you have thoughts about this?
Hi Boone-
Thanks for the explanation, but I'm still not getting it. I'm thinking that Q2 above is the new get_posts()
query to find attachments that match the search terms. Then the post_parent
(the doc or post that the media was uploaded to) is extracted from the results set so it can be added to the final search query.
I was thinking that Q3 is the final search query, where you're adding the wp_posts.ID IN (25,1) OR
piece to include the parents of matching attachments in the results set along with the docs found by the normal WP search. When I run just the postmeta part of the third query,
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
LEFT JOIN wp_postmeta AS sq1
ON ( wp_posts.ID = sq1.post_id
AND sq1.meta_key = '_wp_attached_file' )
WHERE ( sq1.meta_value LIKE '%anchor%' )
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_modified DESC
LIMIT 0, 10
the returned posts are the attachment posts themselves, which would be perfect for Q2 (and help to find attachments by filename). In Q3 though, those posts will be filtered out by the following clauses:
AND wp_posts.post_type = 'bp_doc'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
Maybe I'm not seeing the complete queries somehow? Sorry I'm not getting it yet. :)
Ah, OK. If I replace that get_posts()
with a new WP_Query
, like this:
add_filter( 'posts_clauses', '_filter_query_attachment_filenames' );
$attachment_posts = new WP_Query( array(
'post_type' => 'attachment',
'post_status' => 'inherit',
's' => $this->query_args['search_terms'],
) );
remove_filter( 'posts_clauses', '_filter_query_attachment_filenames' );
Then the attachment search query looks like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID
FROM wp_posts
LEFT JOIN wp_postmeta AS sq1
ON ( wp_posts.ID = sq1.post_id
AND sq1.meta_key = '_wp_attached_file' )
WHERE 1=1
AND (((wp_posts.post_title LIKE '%anchors-aweigh-1.jpg%')
OR (wp_posts.post_excerpt LIKE '%anchors-aweigh-1.jpg%')
OR (wp_posts.post_content LIKE '%anchors-aweigh-1.jpg%')
OR ( sq1.meta_value LIKE '%anchors-aweigh-1.jpg%' )))
AND wp_posts.post_type = 'attachment'
AND ((wp_posts.post_status = 'inherit'))
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_title LIKE '%anchors-aweigh-1.jpg%' DESC, wp_posts.post_date DESC
LIMIT 0, 10
and now I understand what you were saying about suppress_filters.
Right - 5ba8c28 fixed a previous bug related to this. But it also broke the tests, and it's not obvious why.
I'm going to spend some time later converting the attachment LIKE
query to a direct SQL query, and throw some caching around it.
Re:
Fetching the entire post objects is not going to be pretty at scale. This might be a case where we want to either switch to a direct query, or do a fields=ids query and then do a direct query to get post_parent values. Do you have thoughts about this?
Hm. If using a WP_Query-based search to find the ids then getting the post_parents in a direct query, would we get some caching for free and also avoid a future where the WP postmeta structure changes a little and breaks a direct query?
Although, one direct query seems simpler, but totally manual.
I love both options. <3
Hm. If using a WP_Query-based search to find the ids then getting the post_parents in a direct query, would we get some caching for free and also avoid a future where the WP postmeta structure changes a little and breaks a direct query?
I simplified things by going with a direct query. This way, we can add our own caching that is 100% what we need, and 0% what we don't need. It's not the most beautiful thing in the world, but I think the logic is now correct.
Boone, is there more to do here, or can we close this issue? This is a great improvement, by the way.
Looks good to me!
I've got to build this for a client, and it seems like an obvious fit for Docs, so let's roll it in.