askorama / orama

🌌 Fast, dependency-free, full-text and vector search engine with typo tolerance, filters, facets, stemming, and more. Works with any JavaScript runtime, browser, server, service!
https://docs.askorama.ai
Other
8.27k stars 273 forks source link

Different search result after persist and restore database index #695

Open gdeak-monguz opened 2 months ago

gdeak-monguz commented 2 months ago

Describe the bug

I created a database and persisted it with @orama/plugin-data-persistence plugin. After restoring the index from JSON string the search result was diffirent, than the search before persisting it.

To Reproduce

With the following code the bug could be reproduced:

package.json

{
  "name": "orama-pilot",
  "private": true,
  "version": "1.0.0",
  "type": "module",
  "dependencies": {
    "@orama/orama": "^2.0.15",
    "@orama/plugin-data-persistence": "^2.0.15",
    "@orama/stemmers": "^2.0.15",
    "@orama/stopwords": "^2.0.15"
  }
}

index.js

import { create, insert, search } from '@orama/orama';
import { persist, restore } from '@orama/plugin-data-persistence';
import { stopwords as hungarianStopwords } from '@orama/stopwords/hungarian';
import {
  stemmer,
  language as hungarianLanguage,
} from '@orama/stemmers/hungarian';

// Database
const originalDatabaseInstance = await create({
  schema: {
    type: 'string',
    name: 'string',
  },
  components: {
    tokenizer: {
      stopWords: hungarianStopwords,
      stemming: true,
      stemmerSkipProperties: ['type'],
      language: hungarianLanguage,
      stemmer,
    },
  },
});

// Insert record
await insert(originalDatabaseInstance, {
  type: 'infantry',
  name: 'Piski ütközet',
});

const searchOptions = { term: 'Piski' };

// Search from original database index
const searchResultFromOriginalDatabaseInstance = await search(
  originalDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromOriginalDatabaseInstance.count);  // Count: 1

// Persist database index
const databaseIndex = await persist(originalDatabaseInstance, 'json');
// Restore database index
const restoredDatabaseInstance = await restore('json', databaseIndex);

// Search from restored database index
const searchResultFromRestoredDatabaseInstance = await search(
  restoredDatabaseInstance,
  searchOptions
);
console.log('Count:', searchResultFromRestoredDatabaseInstance.count); // Count: 0

Expected behavior

After restoring the database, I expected the same search results as before persistence.

Environment Info

OS: Windows 11 Pro
Node: v20.2.0
@orama/orama: 2.0.15
@orama/plugin-data-persistence: 2.0.15
@orama/stemmers: 2.0.15
@orama/stopwords: 2.0.15

Affected areas

Search

Additional context

No response

micheleriva commented 2 months ago

Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

gdeak-monguz commented 2 months ago

Hi @gdeak-monguz, I fear this is because when you persist in the database, you lose the stemmer (you can't save functions to disk). So I recommend recreating a new database with a stemmer, then use it for restoring the data

How can I do this? I tried to create a new database instance with the same schema and components (tokenizer -> stemmer and stopwords) and use insertMultiple function with this new instance and the persist database index, but it still does not work.