fergiemcdowall / search-index

A persistent, network resilient, full text search library for the browser and Node.js
MIT License
1.38k stars 149 forks source link

How to persist index on disk? #620

Closed phil294 closed 2 weeks ago

phil294 commented 3 months ago

It seems the default strategy is to keep the thing in memory but not save it.

That's weird because in #39 you said

LevelDB files will always go in a directory structure called 'si' in the directory that search-index was invoked from.

So is that outdated or...?

And in #120

Yes, in fact, on-disk storage with instant startup/shutdown should be the default. Are you experiencing problems with startup times?

I'm trying now to enable saving by passing a { db: require('leveldown')('/my/location') } option, but now it fails with

rejected promise not handled within 1 second: Error: get() requires a callback argument
extensionHostProcess.js:147
stack trace: Error: get() requires a callback argument
    at AbstractLevelDOWN.get (/home/phi/b/search++/node_modules/abstract-leveldown/abstract-leveldown.js:78:11)
    at Object.TIMESTAMP_CREATED (/home/phi/b/search++/node_modules/fergies-inverted-index/src/write.js:228:8)
    at makeAFii (/home/phi/b/search++/node_modules/fergies-inverted-index/src/main.js:58:12)

leveldown (once part of levelup, now living separately) relies on abstract-leveldown which is also linked in the docs:

use another backend by passing the appropriate abstract-leveldown when initialising.

but abstract-leveldown's get() function is callback-based https://github.com/Level/abstract-leveldown/blob/master/abstract-leveldown.js whereas fergies-inverted-index write.js expects a promise-based calling convention ops._db.get(['~CREATED'], levelOptions).then(...). So I don't understand how this could have ever worked in the first place...? So this library isn't abstract-leveldown-compatible after all?!

phil294 commented 3 months ago

I tried to fix the signatures manually

let idx = await search_index({
    db: {
        open: (options) => new Promise(resolve => leveldown_db.open(options, resolve)),
        close: () => new Promise(resolve => leveldown_db.close(resolve)),
        get: (key, options) => new Promise(resolve => leveldown_db.get(key, options, resolve)), 
        put: (key, value, options) => new Promise(resolve => leveldown_db.put(key, value, options, resolve)),
        del: (key, options) => new Promise(resolve => leveldown_db.del(key, options, resolve)),
        batch: (array, options) => new Promise(resolve => leveldown_db.batch(array, options, resolve)),
        clear: (options) => new Promise(resolve => leveldown_db.clear(options, resolve)),
    },
})

this fixes the invocation but apparently more stuff is broken in the background, as validateVersion fails with This index was created with Error: Not found: , you are running 4.0.0

  1. So apparently the get() passed on to the native binary (linux x86_64) somehow fails with kNotFound internally, at which point I stopped digging... some guidance would be much appreciated here!
phil294 commented 3 months ago

update: it works if you use ClassicLevel instead, even though the types are wrong

fergiemcdowall commented 3 months ago

Yes, the documentation could do with an update. ClassicLevel should provide persistent storage. When you say the types are wrong- what do you mean?

phil294 commented 2 months ago

It results in a type error by TypeScript / JS tsserver:

let idx = await SI({ db: new ClassicLevel(path, { valueEncoding: 'json }), name: 'asdf' })

->

Type 'ClassicLevel<string, string>' is not assignable to type 'AbstractLevelDOWNConstructor'.
  Type 'ClassicLevel<string, string>' provides no match for the signature '<K = any, V = any>(location: string): AbstractLevelDOWN<K, V>'.ts(2322)

But it works pretty amazing! still trying it out.

Also the TS info for PUT_RAW are wrong as it expects one arg instead of the 2-3 actual ones.

None of that is super important though


Edit: While I really like this library, I have now switched to using SQLite FTS5 as a WASM module for Node instead (https://github.com/tndrle/node-sqlite3-wasm) which appears to be faster by factor ~10 and equally portable

fergiemcdowall commented 2 weeks ago

ClassicLevel is now the default backend in version 5.0.0 if using the lib with node, so all indexes persist to disk unless something else is specified.

For questions about search-index TS definitions these guys are the experts -> DefinitelyTyped/types/search-index