alleyinteractive / searchpress

Elasticsearch integration for WordPress.
87 stars 12 forks source link

Error indexing posts: Limit of total fields in index has been exceeded #101

Open danielbachhuber opened 6 years ago

danielbachhuber commented 6 years ago

When I run:

$ wp searchpress index --flush --put-mapping

I eventually see this error:

Warning: Error indexing post 5021340; HTTP response code: 400; Data: {"index":{"_index":"bgr.test","_type":"post","_id":"5021340","status":400,"error":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [bgr.test] has been exceeded"}}}

The stats end up as:

Completed page 1/1 (29.36s / 101.48M current / 157.01M max), 7s remaining
Success: Index Complete!
1481    posts processed
1081    posts indexed
400 errors/warnings
Replacing the default search with SearchPress...
Success: Successfully activated SearchPress!

Any ideas why there's a cap?

mboynes commented 6 years ago

Yes, Elasticsearch 5.0+ instituted a schema field cap and an accompanying index setting, ultimately for performance. We haven't decided how exactly SearchPress should address that in the long-term, but here's a fix in the interim:

/**
 * Bump up field limit.
 *
 * @param array $mapping ES mapping.
 * @return array Updated mapping.
 */
add_filter( 'sp_config_mapping', function( $mapping ) {
    $mapping['settings']['index']['mapping']['total_fields']['limit'] = 5000;
    return $mapping;
} );

Ultimately, this happens because of the way SearchPress indexes post meta. For each unique meta key, the Elasticsearch schema gets upwards of 8 fields added. We do this to allow us to maintain parity with WP_Query, since ES doesn't (natively) do runtime type casting. The only time we've ever seen performance issues from this is in a case where a site was setting dynamic meta keys (e.g. related-post-{$post_id}) and not filtering them out prior to indexing -- that led to the ES schema inflating to hundreds of thousands of fields before it caused issues.

We have a few ideas for long-term fixes, but we need to do some benchmarking first. I'll leave this open until we do merge in a long-term fix.

danielbachhuber commented 6 years ago

We haven't decided how exactly SearchPress should address that in the long-term, but here's a fix in the interim:

Thanks! The filter works for my needs.