Closed peterjohnhunt closed 5 years ago
Hi @peterjohnhunt , thanks for raising the issue!
I see you have the debug bar installed, that will be very helpful to help with this, could you please paste the Query Body that went to ElasticSearch and the response?
I see you are not getting the product with ElasticPress, having an error message from ES would be a great starting point for me, hope we can solve this quick :)
Thanks!
Hey @oscarssanchez! Thanks for the quick response! Yes, here is the output from the debug bar:
Also a quick note from looking at the elasticpress.io endpoint mapping, it looks like it has the field "isbn" that we are trying to search for listed as a boolean.
Query Args:
/wp-content/plugins/debug-bar-elasticpress/classes/class-ep-debug-bar-elasticpress.php:124:
array (size=69)
'post_type' => string 'product' (length=7)
's' => string '9781462753673' (length=13)
'meta_query' =>
array (size=1)
0 =>
array (size=3)
'key' => string 'isbn' (length=4)
'value' => string '9781462753673' (length=13)
'compare' => string '=' (length=1)
'error' => string '' (length=0)
'm' => string '' (length=0)
'p' => int 0
'post_parent' => string '' (length=0)
'subpost' => string '' (length=0)
'subpost_id' => string '' (length=0)
'attachment' => string '' (length=0)
'attachment_id' => int 0
'name' => string '' (length=0)
'static' => string '' (length=0)
'pagename' => string '' (length=0)
'page_id' => int 0
'second' => string '' (length=0)
'minute' => string '' (length=0)
'hour' => string '' (length=0)
'day' => int 0
'monthnum' => int 0
'year' => int 0
'w' => int 0
'category_name' => string '' (length=0)
'tag' => string '' (length=0)
'cat' => string '' (length=0)
'tag_id' => string '' (length=0)
'author' => string '' (length=0)
'author_name' => string '' (length=0)
'feed' => string '' (length=0)
'tb' => string '' (length=0)
'paged' => int 0
'meta_key' => string '' (length=0)
'meta_value' => string '' (length=0)
'preview' => string '' (length=0)
'sentence' => string '' (length=0)
'title' => string '' (length=0)
'fields' => string '' (length=0)
'menu_order' => string '' (length=0)
'embed' => string '' (length=0)
'category__in' =>
array (size=0)
empty
'category__not_in' =>
array (size=0)
empty
'category__and' =>
array (size=0)
empty
'post__in' =>
array (size=0)
empty
'post__not_in' =>
array (size=0)
empty
'post_name__in' =>
array (size=0)
empty
'tag__in' =>
array (size=0)
empty
'tag__not_in' =>
array (size=0)
empty
'tag__and' =>
array (size=0)
empty
'tag_slug__in' =>
array (size=0)
empty
'tag_slug__and' =>
array (size=0)
empty
'post_parent__in' =>
array (size=0)
empty
'post_parent__not_in' =>
array (size=0)
empty
'author__in' =>
array (size=0)
empty
'author__not_in' =>
array (size=0)
empty
'cache_results' => boolean false
'search_fields' =>
array (size=5)
0 => string 'post_title' (length=10)
1 => string 'post_content' (length=12)
2 => string 'post_excerpt' (length=12)
3 => string 'author_name' (length=11)
'taxonomies' =>
array (size=2)
0 => string 'post_tag' (length=8)
1 => string 'category' (length=8)
'ignore_sticky_posts' => boolean false
'suppress_filters' => boolean false
'update_post_term_cache' => boolean true
'lazy_load_term_meta' => boolean true
'update_post_meta_cache' => boolean true
'posts_per_page' => int 8
'nopaging' => boolean false
'comments_per_page' => string '50' (length=2)
'no_found_rows' => boolean false
'search_terms_count' => int 1
'search_terms' =>
array (size=1)
0 => string '9781462753673' (length=13)
'search_orderby_title' =>
array (size=1)
0 => string 'wp_posts.post_title LIKE '{670d77268ebd6acb288729650b0d7031760254d818d793dfb08978fd28726896}9781462753673{670d77268ebd6acb288729650b0d7031760254d818d793dfb08978fd28726896}'' (length=172)
'order' => string 'DESC' (length=4)
Query Body:
{
"from": 0,
"size": 8,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "9781462753673",
"type": "phrase",
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.post_tag.name",
"terms.category.name",
"post_author.login"
],
"boost": 4
}
},
{
"multi_match": {
"query": "9781462753673",
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.post_tag.name",
"terms.category.name",
"post_author.login"
],
"boost": 2,
"fuzziness": 0,
"operator": "and"
}
},
{
"multi_match": {
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.post_tag.name",
"terms.category.name",
"post_author.login"
],
"query": "9781462753673",
"fuzziness": 1
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "14d",
"decay": 0.25,
"offset": "7d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"terms": {
"meta.isbn.raw": [
"9781462753673"
]
}
}
]
}
},
{
"term": {
"post_type.raw": "product"
}
},
{
"terms": {
"post_status": [
"publish",
"acf-disabled"
]
}
}
]
}
}
}
Query Response Code: 200 Query response:
{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Thanks for the information @peterjohnhunt ,
I think this would be an issue with mappings, as ES would search for an integer or string but found a boolean field instead.
May I ask how is the ISBN meta generated so I can reproduce the problem?
Hey @oscarssanchez, so this was setup as an ACF text field. It may not be filled in on all posts, but it is working for all other situations (including a search without EP).
Let me know what else would be helpful to help you recreate it. I'm also happy to get you delegated access to the staging site in question if that helps speed things along!
Only other thing i can think of is that a lot of these posts get auto updated via a programatic wp_update_post
function as part of the 'meta_input' array.
Hey @peterjohnhunt thanks!, i can confirm the same result with a text field from ACF. I will look into it and update once I find the problem.
Hi @peterjohnhunt ,
Turns out this was not a bug and mappings were just fine, sorry about that!
The solution for this is to set isbn
as a search field for woocommerce queries, you can use the 'shop_order_search_fields'
filter to add it : https://github.com/10up/ElasticPress/blob/develop/features/woocommerce/woocommerce.php#L431
After that, your initial query, with ep_integrate
set to true
should work flawless:
$args = array(
'ep_integrate' => true,
'post_type' => 'product',
'meta_query' => array(
array(
'key' => 'isbn',
'value' => '9781462753673',
'compare' => '=',
),
),
);
$query = new WP_Query( $args );
Whenever you add a new search field for your products, remember to add it in case you want to search it from a search box or when setting 's' explicitly.
Hope the query works out now! and please let us know if the issue persist or not :)
Hey @oscarssanchez! I'm actually not using woocommerce though i do have a custom post type named product. Do i need to do something else since i'm not using woocommerce?
Sure @peterjohnhunt there's also the ep_search_fields
filter :
https://github.com/10up/ElasticPress/blob/develop/classes/class-ep-api.php#L1467
Hope this works for your case!
Ok, just to make sure i'm tracking, you're saying the meta fields were indexed fine and are in the elasticpress.io server ok, but i need to enable them on my side to be used via the ep_search_fields
filter? Or that i need that to re-index stuff correctly?
My end goal is actually to have this as my query:
$args = array(
's' => '9781462753673',
'post_type' => 'product',
'search_fields' => array(
'post_title',
'post_content',
'post_excerpt',
'taxonomies' => array( 'product_contributor', 'product_category', 'product_tag', 'product_format', 'product_feature', 'product_color' ),
'meta' => array( 'isbn', 'lin' ),
),
);
$query = new WP_Query( $args );
and a search to result with that ISBN, however in testing with Thorsten Ott he mentioned i use the meta_query method for testing as it was more clear to help determine the underlying issue.
Does filling in search_fields
do the same thing as using the ep_search_fields
filter? If so, i can post the query results for that, as I believe in Thorsten and my testing that still resulted in no results!
Thanks again @oscarssanchez for all the help, and the quick turn around too! Hopefully i can get this wrapped up and will be all good!
@peterjohnhunt That should be right!
I noticed i did not have issues with post_type: product
since this is a post_type created by woocommerce, hence it is indexed.
However, when trying to reproduce the same query without it, it did not return any results, makes sense since there's no post_type field to search once woocommerce is disabled! Could you check this please?
You can easily check if there's a post_type product
in your elasticpress.io server, please do the following:
Hit this URL http://youresserver/_cat/indices?v (youresserver
= your elasticpress.io endpoint . Please do not share this info here) with a GET request and copy your index name (for this, you will need to enter your username and token (password) as basic authorization):
After this, hit this URL http://youresserver/indexname/_search?pretty=true with this POST request:
{
"stored_fields" : ["post_type"],
"query" : {
"term" : { "post_type" : "product" }
}
}
If the post type product
is indexed, you should get some results to show. If not, you would need to add them, can you paste the code to check how are you creating the post_type in case you don't get results to show ?
Thanks!
I did get some results following those steps! the search has been working fine for us in relation to title, content, and excerpt it would seem as we are already using that successfully in production! It only appears not to be working for meta keys/values whether we do it via meta_query
, or search_fields
For reference, here is the result of that POST request for post_type product:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6747,
"max_score": 2.046869,
"hits": [
{
"_index": "myserverurl",
"_type": "post",
"_id": "95059",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "95055",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "95050",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "95005",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "95002",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "94938",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "94894",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "94890",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "94529",
"_score": 2.046869
},
{
"_index": "myserverurl",
"_type": "post",
"_id": "94515",
"_score": 2.046869
}
]
}
}
here is the full query body going to the elastic search server when i use the search_fields:
{
"from": 0,
"size": 8,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "9781462753673",
"type": "phrase",
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.product_contributor.name",
"terms.product_category.name",
"terms.product_tag.name",
"terms.product_format.name",
"terms.product_feature.name",
"terms.product_color.name",
"meta.isbn.value",
"meta.lin.value"
],
"boost": 4
}
},
{
"multi_match": {
"query": "9781462753673",
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.product_contributor.name",
"terms.product_category.name",
"terms.product_tag.name",
"terms.product_format.name",
"terms.product_feature.name",
"terms.product_color.name",
"meta.isbn.value",
"meta.lin.value"
],
"boost": 2,
"fuzziness": 0,
"operator": "and"
}
},
{
"multi_match": {
"fields": [
"post_title",
"post_content",
"post_excerpt",
"terms.product_contributor.name",
"terms.product_category.name",
"terms.product_tag.name",
"terms.product_format.name",
"terms.product_feature.name",
"terms.product_color.name",
"meta.isbn.value",
"meta.lin.value"
],
"query": "9781462753673",
"fuzziness": 1
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "14d",
"decay": 0.25,
"offset": "7d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"post_type.raw": "product"
}
},
{
"terms": {
"post_status": [
"publish",
"acf-disabled"
]
}
}
]
}
}
}
Hey @peterjohnhunt I've been poking around with this with a custom post_type product
with some ISBN
meta, but this seems unusual because setting the search_fields and a meta query like this works for me:
$args = array(
's' => '9781462753673',
'post_type' => 'product',
'search_fields' => array(
'meta' => array( 'isbn' ),
),
);
$query = new WP_Query( $args );
Can you please do a POST request again with this query and check if you get no results?
{
"from": 0,
"size": 10,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "9781462753673",
"type": "phrase",
"fields": [
"meta.isbn.value"
],
"boost": 4
}
},
{
"multi_match": {
"query": "9781462753673",
"fields": [
"meta.isbn.value"
],
"boost": 2,
"fuzziness": 0,
"operator": "and"
}
},
{
"multi_match": {
"fields": [
"meta.isbn.value"
],
"query": "9781462753673",
"fuzziness": 1
}
}
]
}
},
"exp": {
"post_date_gmt": {
"scale": "14d",
"decay": 0.25,
"offset": "7d"
}
},
"score_mode": "avg",
"boost_mode": "sum"
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"post_type.raw": "product"
}
},
{
"terms": {
"post_status": [
"publish",
"acf-disabled"
]
}
}
]
}
}
}
This is how my mappings look for isbn so we can also compare :
"meta": {
"properties": {
"isbn": {
"properties": {
"boolean": {
"type": "boolean"
},
"date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"datetime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"double": {
"type": "double"
},
"long": {
"type": "long"
},
"raw": {
"type": "keyword",
"ignore_above": 10922
},
"time": {
"type": "date",
"format": "HH:mm:ss"
},
"value": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 10922
},
"sortable": {
"type": "keyword",
"ignore_above": 10922,
"normalizer": "lowerasciinormalizer"
}
}
}
}
}
}
},
Thanks!
Hey @oscarssanchez! Thanks for continuing to dig on this! Still no results when posting with that specific query.
Here is my mapping:
"isbn":{
"properties":{
"boolean":{
"type":"boolean"
},
"date":{
"type":"date",
"format":"yyyy-MM-dd"
},
"datetime":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
},
"double":{
"type":"double"
},
"long":{
"type":"long"
},
"raw":{
"type":"keyword",
"ignore_above":10922
},
"time":{
"type":"date",
"format":"HH:mm:ss"
},
"value":{
"type":"text",
"fields":{
"raw":{
"type":"keyword",
"ignore_above":10922
},
"sortable":{
"type":"keyword",
"ignore_above":10922,
"normalizer":"lowerasciinormalizer"
}
}
}
}
}
Our mappings look identical @peterjohnhunt ,
Let's take a look at the data of the product with isbn 9781462753673 . Could you search for it by it's title and paste how it looks? This is how it looks for me :
{
"_index": "localhost-1",
"_type": "post",
"_id": "7",
"_score": 3.0137746,
"_source": {
"post_id": 7,
"ID": 7,
"post_author": {
"raw": "",
"login": "",
"display_name": "",
"id": ""
},
"post_date": "2018-12-20 00:49:33",
"post_date_gmt": "2018-12-20 00:49:33",
"post_title": "test1",
"post_excerpt": "",
"post_content_filtered": "",
"post_content": "",
"post_status": "publish",
"post_name": "test1",
"post_modified": "2018-12-20 00:49:40",
"post_modified_gmt": "2018-12-20 00:49:40",
"post_parent": 0,
"post_type": "product",
"post_mime_type": "",
"permalink": "http://localhost/product/test1/",
"terms": [],
"meta": {
"isbn": [
{
"value": "9781462753673",
"raw": "9781462753673",
"long": 9781462753673,
"double": 9781462753673,
"boolean": false,
"date": "1971-01-01",
"datetime": "1971-01-01 00:00:01",
"time": "00:00:01"
}
]
},
"date_terms": {
"year": 2018,
"month": 12,
"week": 51,
"dayofyear": 353,
"day": 20,
"dayofweek": 4,
"dayofweek_iso": 4,
"hour": 0,
"minute": 49,
"second": 33,
"m": 201812
},
"comment_count": 0,
"comment_status": "closed",
"ping_status": "closed",
"menu_order": 0,
"guid": "http://localhost/?post_type=product&p=7"
}
}
Hey @oscarssanchez, very helpful! So doing additional digging here, i'm actually noticing that though search is working and returning results when various keywords are used, it almost seems as if there are a number of titles that may not be indexed? I was also able to find one that is for sure indexed, and the ISBN is working when searching for that. So it appears it's just related to some posts.
But for instance when i search for the exact post title of some specific posts, i'd expect that post to be first in the results, but it's not showing in the top 8 at least. I have quite a number of posts though so it's hard to know if it's for sure not indexed, or just not showing in the first couple pages... though the lack of it showing up by ISBN makes me think it's not indexed to begin with.
During initial testing i re-ran wp elasticpress index --setup
to make sure everything is indexed. is there a better way i can confirm if a specific post is indexed by post id? or see why some might be not getting indexed?
Great, thanks for the information @peterjohnhunt ,
Another person recently had a similar issue and it was because of some meta remanents generated by ACF. Could you please index your content with this wp-cli command: wp elasticpress index --setup --show-bulk-errors
Let's see what errors it throws :)
started this a while ago and will update you once it's complete! no errors half way through as of yet!
aha! It appears that my error is this:
type: illegal_argument_exception
reason: Limit of total fields [5000] in index [myserver] has been exceeded
I've disabled protected content for now and am doing a resync as a have quite a bit of admin only data that i don't have to sync. Thanks for all the help @oscarssanchez! Would be nice if there was an alert or something in the plugin or elasticpress.io account page that showed i was over my quota!
I'm re-running an index now to confirm this was the issue! That being said, i don't actually have WP-CLI access on my production server. is there an easy way i can delete the current index and re-index on there?
Hi @peterjohnhunt ,
This actually is related to the total field size on your elasticsearch server, not the documents you can index. For the essential plan that is 20,000 documents, far more than 5,000.
This issue is probably happening because of the meta you have right now. You probably have too much meta data in your database due to poor coded plugins. We had this exact same issue recently.
There's potentially two options to solve this:
1.- You can try elevating the total_field_max_size of elasticsearch with filter: ep_total_field_limit
(this wont work if there's a lot of meta. I think elasticsearch would crash if you go higher than 10,000)
2.- You can clean up some of that meta in your database / exclude from getting indexed ( i highly recommend cleaning, though ). I would look for duplicated or similar values that are not in use anymore. After this, everything should get indexed :)
For your question on re-indexing, everytime you run a re index from the dashboard, your previous index is getting deleted. wp-cli is good when debugging as well as customizing the post number to index per cycle.
Thanks Oscar! I'll look to clean up some old meta. I do have quite a bit of post meta that's used purely for admin related posts. What would the best way for me to exclude those from indexing be?
Also, just to clarify, ideally you're saying i should have less than 5000 unique meta_key that are being indexed, correct?
That's right @peterjohnhunt , 5,000 seem like a lot of meta_keys for me . If there's some meta_keys that you would like to keep but not get indexed into elasticsearch, you can use this filter: ep_prepare_meta_excluded_public_keys
. Private meta is not indexed by default.
Thanks Oscar! I excluded the offending meta by filtering is_protectedmeta so as to more easily use a wildcard check since the meta that is bloating it has a consistent meta prefix (similar to the in protected meta). I resynced, and everything appears to be working now. It would be cool if there was an additional filter for $allow_index
to be filtered within elastic press passing in the meta key so it could more easily be dynamically checked instead of having to be fully specified in the ep_prepare_meta_excluded_public_keys
. I.e.:
foreach ( $meta as $key => $value ) {
$allow_index = false;
if ( is_protected_meta( $key ) ) {
if ( true === $allowed_protected_keys || in_array( $key, $allowed_protected_keys ) ) {
$allow_index = true;
}
} else {
if ( true !== $excluded_public_keys && ! in_array( $key, $excluded_public_keys ) ) {
$allow_index = true;
}
}
if ( true === apply_filters( 'ep_prepare_meta_allow_index', $allow_index, $key, $post) || apply_filters( 'ep_prepare_meta_whitelist_key', false, $key, $post ) ) {
$prepared_meta[ $key ] = maybe_unserialize( $value );
}
}
Best option for me to submit a PR on that?
Otherwise, my issue is resolved and i greatly appreciate your help!
Hi @peterjohnhunt, PRs are always appreciated.
Have a great day.
For folks wondering how to use ep_prepare_meta_excluded_public_keys
here's an example solution. In functions.php
of your theme (or wherever you put code specific to particular plugins):
<?php
add_filter( "ep_prepare_meta_excluded_public_keys", function ($arr, $post) {
$metaKeys = array_keys(get_post_meta($post->ID));
$excluded = array_filter($metaKeys, function($key) {
$excludedKeyPatterns = [
'/^regex_patterns_of_keys/',
'/^that_you_want/',
'/^to_exclude/'
];
foreach($excludedKeyPatterns as $pattern) {
if (preg_match($pattern, $key) !== 0) {
return true;
}
}
return false;
});
return $excluded;
}, 1, 2 );
This goes through each meta key and excludes it if it matches a regex pattern defined in $excludedKeyPatterns
.
@twhid thanks for sharing this! Would you mind if we added this example to our docs?
I don't mind.
I was just thinking that $arr
and $excluded
should perhaps be merged before returning, but it appears in your source code that an empty array is always passed to it (unless you apply that filter in places other than ElasticPress\Indexable\Post\Post
).
Hi @twhid, thanks so much for your example - it helps a lot. I have tried adding it to my website, like this
add_filter( "ep_prepare_meta_excluded_public_keys", function ($arr, $post) {
$metaKeys = array_keys(get_post_meta($post->ID));
$excluded = array_filter($metaKeys, function($key) {
$excludedKeyPatterns = [
'_backorders',
'_crosssell_ids',
'_downloadable',
'_featured',
'_height',
'_length',
'_manage_stock',
'_price',
'_product_image_gallery',
'_product_version',
'_purchase_note',
'_regular_price',
'_sale_price',
'_sale_price_dates_from',
'_sale_price_dates_to',
'_sku',
'_sold_individually',
'_stock',
........ (many more)
];
});
return $excluded;
}, 1, 2 );
I have removed the regex part because I don't think it applies to my meta keys.
I am not sure I have typed that correctly since it doesn't seem to work. Could you please point me in the right direction?
Many thanks
@jeffceriello if you have a hard-coded list of keys you can just return that array from the function (there's no reason to filter the meta keys from the post). In my case there were many very similar keys being created by some plugins that I didn't need indexed (and more could be added in the future) so I filtered all they keys on every post by those regexes to generate the exclude list.
Hope this helps!
Have you searched for similar issues before submitting this one? Yes
Is this a bug, question or feature request? Bug
Describe the issue you encountered:
Current WordPress version: 5.0.1
Current ElasticPress version: 2.7.0
Current Elasticsearch version: not sure
Where do you host your Elasticsearch server: elasticpress.io
Other plugins installed (WooCommerce, Simple Redirect Manager, etc..):
Steps to reproduce:
I chatted with support (Thorsten Ott) for a bit to determine why searching by meta field wasn't working for my site. He walked me through a bunch of debugging including re-indexing etc and recommended i submit a bug report here. Here are the details
Using a default WP_Query of:
Returns this:
Versus an ElasticPress query of:
which returns no results with: