basemkhirat / elasticsearch

The missing elasticsearch ORM for Laravel, Lumen and Native php applications
MIT License
401 stars 130 forks source link

Accents #13

Closed MarcosBL closed 7 years ago

MarcosBL commented 7 years ago

Hi !

I'm trying to search using your code trought scout, like stated in the docs. What I'm trying is to do a query that ignores accents, I mean:

Now, when I search with accents, I can only get the accented words, if I search without accents, I can only get the non-accented words. I have used Elasticquent in the past, where I could:

    protected $indexSettings = [
        'analysis' => [
            'analyzer' => [
                'folding' => [
                    'tokenizer' => 'standard',
                    'filter' => [ 'standard', 'lowercase', 'asciifolding', 'spanish_stemmer', 'spanish_snowball' ],
                ],
            ],
            'filter' => [
                'spanish_stemmer' => [
                    "type" => "stemmer",
                    "name" => "spanish"
                ],
                'spanish_snowball' => [
                    "type" => "snowball",
                    "language" => "Spanish"
                ]
            ]
        ],
    ];
    protected $mappingProperties = [
        'titulo' => [ 'type' => 'string', 'analyzer' => 'folding' ],
        'texto' => [ 'type' => 'string', 'analyzer' => 'folding' ],
    ];

Using that, I could search in the "Titulo" and "Texto" fields, and both "Málaga" and "Malaga" would appear for a search "malaga" or "málaga". However, that's old code and I'm not even sure that would work with current Elasticsearch... so.. any way I can set the analyzer in ES so I can define "ignore accents"?

basemkhirat commented 7 years ago

Hello @MarcosBL,

In es.php configuration file you can define index settings, mapping and aliases.


'indices' => [
        'my_index_1' => [
            'settings' => [
                "number_of_shards" => 1,
                "number_of_replicas" => 0,
                'analysis' => [
                    'analyzer' => [
                        'folding' => [
                            'tokenizer' => 'standard',
                            'filter' => ['standard', 'lowercase', 'asciifolding', 'spanish_stemmer', 'spanish_snowball'],
                        ],
                    ],
                    'filter' => [
                        'spanish_stemmer' => [
                            "type" => "stemmer",
                            "name" => "spanish"
                        ],
                        'spanish_snowball' => [
                            "type" => "snowball",
                            "language" => "Spanish"
                        ]
                    ]
                ]
            ]

        ]
    ]

Then, run

$ php artisan es:indices:update my_index_1
MarcosBL commented 7 years ago

Hi @basemkhirat and thank you !

It worked fine, and I can see the server config using Postman, my fault was creating the indices with Scout:import, thinking it will respect the driver and use ES settings in the background: it doesn't

It worked fine after a artisan es:indices:create:

{
    "acedis": {
        "settings": {
            "index": {
                "creation_date": "1489513467400",
                "analysis": {
                    "filter": {
                        "spanish_snowball": {
                            "type": "snowball",
                            "language": "Spanish"
                        },
                        "spanish_stemmer": {
                            "name": "spanish",
                            "type": "stemmer"
                        }
                    },
                    "analyzer": {
                        "default": {
                            "filter": [
                                "standard",
                                "lowercase",
                                "asciifolding",
                                "spanish_stemmer",
                                "spanish_snowball"
                            ],
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_shards": "1",
                "number_of_replicas": "0",
                "uuid": "XXXXX",
                "version": {
                    "created": "2040499"
                }
            }
        }
    }
}

If I search using ElasticHQ ( http://www.elastichq.org/ ) I can see now the results are the same for 'Málaga' and 'Malaga': however, MyModel::search($q)->paginate(50); keeps giving bad results, curious enought, the same results as if I search malaga (wildcards) in ElasticHQ, it seems ES + Scout is using * in the query, can that be posible ? and can be override using Scout ?

basemkhirat commented 7 years ago

Hi @MarcosBL,

Of course you can extend the scout engine class with these steps:

  1. In app/Providers/AppServiceProvider.php
<?php

namespace App\Providers;

use Illuminate\Support\ServiceProvider;

use Laravel\Scout\EngineManager;
use Elasticsearch\ClientBuilder;
use App\NewScoutEngine;

class AppServiceProvider extends ServiceProvider
{
    /**
     * Bootstrap any application services.
     *
     * @return void
     */
    public function boot()
    {
        $this->app->make(EngineManager::class)->extend('new_es', function () {

            $config = config('es.connections.' . config('scout.es.connection'));

            return new NewScoutEngine(
                ClientBuilder::create()->setHosts($config["servers"])->build(),
                $config["index"]
            );

        });
    }

    /**
     * Register any application services.
     *
     * @return void
     */
    public function register()
    {

    }
}

Then, define App\NewScoutEngine class with extended performSearch method (* REMOVED):

<?php

namespace App;

use Basemkhirat\Elasticsearch\ScoutEngine;
use Laravel\Scout\Builder;

class NewScoutEngine extends ScoutEngine
{

    /**
     * Perform the given search on the engine.
     *
     * @param  Builder  $builder
     * @param  array  $options
     * @return mixed
     */
    protected function performSearch(Builder $builder, array $options = [])
    {
        $params = [
            'index' => $this->index,
            'type' => $builder->model->searchableAs(),
            'body' => [
                'query' => [
                    'bool' => [
                        'must' => [['query_string' => [ 'query' => "{$builder->query}"]]]
                    ]
                ]
            ]
        ];

        if (isset($options['from'])) {
            $params['body']['from'] = $options['from'];
        }

        if (isset($options['size'])) {
            $params['body']['size'] = $options['size'];
        }

        if (isset($options['numericFilters']) && count($options['numericFilters'])) {
            $params['body']['query']['bool']['must'] = array_merge($params['body']['query']['bool']['must'],
                $options['numericFilters']);
        }

        return $this->elastic->search($params);
    }

}

Finally, change laravel scout driver in scout.php to new_es

Try and check your search results.

MarcosBL commented 7 years ago

OMG, working great first try ! I learn a lot from your example, thank you very much for your help!

As an improvement, an ES config wildcard true/false propagated here https://github.com/basemkhirat/elasticsearch/blob/18af52d348051e2c51c3a9a4194122539c6bf810/src/ScoutEngine.php#L135 is a small change and maybe could make it easier for others.

Once again, THANKS ! :+1:

basemkhirat commented 7 years ago

You are welcome @MarcosBL. Good Luck :)