babenkoivan / elastic-migrations

Elasticsearch migrations for Laravel
MIT License
188 stars 32 forks source link

Refreshing synonyms #46

Closed appsol closed 1 year ago

appsol commented 1 year ago

I've recently upgraded a Laravel app from 5.8 to 9.0 and as part of this I have migrated from your previous package to your new suite of packages. I am currently using:

laravel/framework: 9.46.0
laravel/scout: 9.7.2
babenkoivan/elastic-adapter: 2.4.0
babenkoivan/elastic-migrations: 2.0.1
babenkoivan/elastic-scout-driver: 2.0.0
babenkoivan/elastic-scout-driver-plus: 3.5.0

I have successfully moved all the functionality over with one final issue; updating the synonyms in the index settings.

Migrations are created using:

    /**
     * Run the migration.
     */
    public function up(): void
    {
        $settings = (new ServicesIndexSettings())->getSettings();
        Index::dropIfExists('services');
        Index::createRaw('services', $this->mapping, $settings);
    }

Settings are created using:

    /**
     * @return array
     */
    public function getSettings(): array
    {
        return [
            'analysis' => [
                'analyzer' => [
                    'default' => [
                        'tokenizer' => 'standard',
                        'filter' => ['lowercase', 'synonym', 'stopwords'],
                    ],
                ],
                'filter' => [
                    'synonym' => [
                        'type' => 'synonym',
                        'synonyms' => $this->getThesaurus()
                    ],
                    'stopwords' => [
                        'type' => 'stop',
                        'stopwords' => $this->getStopWords()
                    ],
                ],
            ],
        ];
    }

This works successfully when setting the index up. But I also have an api to allow a new thesaurus to be uploaded which stores the updated thesaurus in a .csv file which is read by the ServicesIndexSettings::getThesaurus method. The api controller stores the new synonyms in the file, then calls a re-index command: Artisan::call(ReindexElasticsearchCommand::class).

ReindexElasticsearchCommand:

/**
     * Execute the console command.
     */
    public function handle()
    {
        if (Config::get('scout.driver') !== 'elastic') {
            $this->warn('Did not reindex due to not using the [elastic] Scout driver.');

            return;
        }

        $this->line('Drop all elastic search indices and re-run migrations');
        $this->call('scout:delete-all-indexes');
        $this->call('elastic:migrate:fresh', ['--force']);
        $this->call('scout:sync-index-settings');

        if (Schema::hasTable((new Service())->getTable())) {
            $this->import(Service::class);
        }

        if (Schema::hasTable((new OrganisationEvent())->getTable())) {
            $this->import(OrganisationEvent::class);
        }

        if (Schema::hasTable((new Page())->getTable())) {
            $this->import(Page::class);
        }
    }

    protected function import(string $model): void
    {
        $this->line("Importing documents for [{$model}]...");
        $this->call('ck:scout-import', ['model' => $model]);
    }

This works successfully to re-index the indexes with any changes to the migrations or mappings, but does not seem to update the index settings, or at least not the synonyms as the relevant test fails:

public function test_thesaurus_works_with_search()
    {
        $service = Service::factory()->create([
            'name' => 'Helping People',
        ]);
        $user = User::factory()->create()->makeGlobalAdmin();

        Passport::actingAs($user);
        $updateResponse = $this->json('PUT', '/core/v1/thesaurus', [
            'synonyms' => [
                ['persons', 'people'],
            ],
        ]);

        $updateResponse->assertStatus(Response::HTTP_OK);

        sleep(1);

        $searchResponse = $this->json('POST', '/core/v1/search', [
            'query' => 'persons',
        ]);
        $searchResponse->assertJsonFragment([
            'id' => $service->id,
        ]);
    }

Do you have any insights into how I should go about updating the synonyms for an index? Thanks

babenkoivan commented 1 year ago

Hey, @appsol, I see that you recreate the index, so synonyms should be added. I can't tell why it's not working just by looking at your code, you probably need to do some debugging. First, check if a regular search works:

$searchResponse = $this->json('POST', '/core/v1/search', [
     'query' => 'people',
]);

Does it return anything? If not, perhaps you need to set refresh_documents=true in your tests (see more details here) or disable queues in Scout settings.

If this is not the problem, I recommend you make direct queries to your index in order to validate that synonyms are part of the index settings and that the documents are successfully indexed.

appsol commented 1 year ago

Hello @babenkoivan, thanks for the response. I've been going through your suggestions. I've discovered that $this->call('scout:delete-all-indexes'); is not supported so achieves nothing. Instead using the method Index::dropIfExists works better. I've also discovered some of the confusion came from getting indexes mixed up as I had an index prefix for testing I had forgotten about. So having reached a stage where the index is recreated with the synonyms in place I can now refresh the synonyms using this method. However, I have now found that the synonyms are not being applied to searches. With the synonyms:

autism,autistic,asd
not drinking,dehydration,
dehydration,thirsty,drought

I can test the index with:

http://elasticsearch:9200/testing_services/_analyze
            [
                'field' => 'name',
                'text' => 'Helping asd'
            ]

The correct synonyms are returned:

"tokens" => array:4 [
    0 => array:5 [
      "token" => "helping"
      "start_offset" => 0
      "end_offset" => 7
      "type" => "<ALPHANUM>"
      "position" => 0
    ]
    1 => array:5 [
      "token" => "asd"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "<ALPHANUM>"
      "position" => 1
    ]
    2 => array:5 [
      "token" => "autism"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "SYNONYM"
      "position" => 1
    ]
    3 => array:5 [
      "token" => "autistic"
      "start_offset" => 8
      "end_offset" => 11
      "type" => "SYNONYM"
      "position" => 1
    ]
  ]

But if I then add a document:

"id" => "d59ddde4-4f3a-40e0-8a36-392d2198acb1"
  "name" => "Helping asd"
  "intro" => "Id et est ut rerum id vel cupiditate"
  "description" => "Ipsam ea et qui voluptatem quia excepturi"
  "wait_time" => null
  "is_free" => true
  "status" => "active"
  "score" => 1
  "organisation_name" => "Stevens Ltd dolorum 37063"
  "taxonomy_categories" => []
  "collection_categories" => []
  "collection_personas" => []
  "service_locations" => []
  "service_eligibilities" => array:7 [
    0 => "Age Group All"
    1 => "Disability All"
    2 => "Gender All"
    3 => "Income All"
    4 => "Language All"
    5 => "Ethnicity All"
    6 => "Housing All"
  ]

and run a search:

"body" => array:3 [
    "query" => array:1 [
      "function_score" => array:2 [
        "query" => array:1 [
          "bool" => array:4 [
            "must" => []
            "should" => array:5 [
              0 => array:1 [
                "match" => array:1 [
                  "name" => array:3 [
                    "query" => "autism"
                    "boost" => 3
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              1 => array:1 [
                "match" => array:1 [
                  "organisation_name" => array:3 [
                    "query" => "autism"
                    "boost" => 3
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              2 => array:1 [
                "match" => array:1 [
                  "intro" => array:3 [
                    "query" => "autism"
                    "boost" => 2
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              3 => array:1 [
                "match" => array:1 [
                  "description" => array:3 [
                    "query" => "autism"
                    "boost" => 1.5
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
              4 => array:1 [
                "match" => array:1 [
                  "taxonomy_categories" => array:3 [
                    "query" => "autism"
                    "boost" => 1
                    "fuzziness" => "AUTO"
                  ]
                ]
              ]
            ]
            "filter" => array:1 [
              0 => array:1 [
                "term" => array:1 [
                  "status" => "active"
                ]
              ]
            ]
            "minimum_should_match" => 1
          ]
        ]
        "functions" => array:1 [
          0 => array:1 [
            "field_value_factor" => array:3 [
              "field" => "score"
              "missing" => 1
              "modifier" => "ln1p"
            ]
          ]
        ]
      ]
    ]
    "from" => 0
    "size" => 25
  ]

I get no results, but changing the query value to asd returns the document. So it looks as though the index is created correctly, with the correct synonyms, but they are not applied to a search. If you would rather this in a new issue, just let me know and I'll move it. Thanks

appsol commented 1 year ago

@babenkoivan I've managed to get this to work now. My fault. I had set up the model incorrectly. I had added:

public function searchableAs()
    {
        return 'services';
    }

instead of:

public function searchableAs()
    {
        return config('scout.prefix') . 'services';
    }

Which when I defined:

<env name="SCOUT_PREFIX" value="testing_"/>

in phpunit.xml meant that the model was registered on the 'services' index not the 'testing_services' index so was not found or returned in the search. Thanks so much for looking into this for me.

babenkoivan commented 1 year ago

Glad to hear you solved all issues!