fergiemcdowall / search-index

A persistent, network resilient, full text search library for the browser and Node.js
MIT License
1.39k stars 149 forks source link

Delete operations extremely slow. level-transactions times out #223

Closed cleverplatypus closed 8 years ago

cleverplatypus commented 9 years ago

Hi,

I'm running SI within my ecommerce integration express app.

I created a deletable index to quickly search on 300 product docs, 3k per doc.

The ecommerce platform calls webhooks to inform my app that the product changed. When it's deleted or set invisible, I want to remove it from the index.

The se.del() method takes up to 2 minutes to complete and I only managed to see it finish because I modified this https://github.com/cshum/level-transactions/blob/master/index.js (line 26) multiplying the TTL by 10.... that is, in production the deletion fails and invisible/deleted products appear in search results.

Any clue? Nico

fergiemcdowall commented 9 years ago

Hi Nico

I have a vague feeling that this was fixed- what version of search-index are you running?

cleverplatypus commented 9 years ago

Hi Fergie, quick reply, thanks.

I'm on 0.6.11.

cleverplatypus commented 9 years ago

I installed 0.6.14 locally now. Delete takes a few seconds but if I do a delete+add operation to update the index, the node process hangs for a couple of minutes then I get

Tue, 17 Nov 2015 12:47:06 GMT uncaughtException Object #<Object> has no method 'log10'
TypeError: Object #<Object> has no method 'log10'
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/lib/search/searcher.js:415:35
    at Array.forEach (native)
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/lib/search/searcher.js:409:16
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:52:16
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:363:13
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:52:16
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:272:32
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:44:16
    at /Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/node_modules/async/lib/async.js:360:17
    at ReadStream.<anonymous> (/Users/aekidna/Development/Projects/Heydo/HeydoEucalyptus/node_modules/search-index/lib/search/searcher.js:382:18)
cleverplatypus commented 9 years ago

The above error seems to be intermittent. What's worrying is that the node process goes 100%+ cpu and the servers stops responding.

Are there open issues related to this problem?

fergiemcdowall commented 9 years ago

Hmmmm.

Thats an interesting error- I can see no obvious reason why it should be failing on 0.6.14. It should be easy to index 300 docs at 3k per doc.

If you want to you can set up a test case with your data and post it here, or send your data to me on fergusmcdowall@gmail.com , and I can try to make it work.

(BTW add automatically deletes a document if it has the same id, so there should be no need to do delete+add, you can just do an add. Not sure if this will make any practical difference to you problem, although it might be worth a try.)

fergiemcdowall commented 9 years ago

Just one thing- if you are running search-index over a network, maybe https://www.npmjs.com/package/norch might work for you? Its basically search-index over http

cleverplatypus commented 9 years ago

I did have a look at Norch back when I started to project but that would mean having one additional hosted service. My node server on openshift does quite a lot of stuff and synchronises data with search-index, bigcommerce and firebase...

cleverplatypus commented 9 years ago

I removed the double call (del+add) in favour of either del or add and it makes no difference in terms of the strain the add operation puts node under.

here's the transcript of the last operation

127.0.0.1 - - [17/Nov/2015:13:18:10 +0000] "POST /bc/webhooks/product_updated HTTP/1.1" 200 2 "-" "Apache-HttpClient/4.2.5 (java 1.5)"
Updated product 161 on koala after webhook call from BigCommerce
Updating search-index
Deleting product 161 from index
Adding/Updating product 161 again to index
http://cdn1.bigcommerce.com/server2300/5et4o2od/products/161/images/1749/LogoBD_WhitePrintingFluo_ITALIA_900px__51642.1446118312.60.90.jpg?c=2
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: keyword_filter field is null, SKIPPING","time":"2015-11-17T13:18:25.120Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_visible field not string or array, SKIPPING","time":"2015-11-17T13:18:25.120Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_featured field not string or array, SKIPPING","time":"2015-11-17T13:18:25.120Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_free_shipping field not string or array, SKIPPING","time":"2015-11-17T13:18:25.120Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_price_hidden field not string or array, SKIPPING","time":"2015-11-17T13:18:25.120Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_condition_shown field not string or array, SKIPPING","time":"2015-11-17T13:18:25.121Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_preorder_only field not string or array, SKIPPING","time":"2015-11-17T13:18:25.121Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: is_open_graph_thumbnail field not string or array, SKIPPING","time":"2015-11-17T13:18:25.121Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: primary_image field not string or array, SKIPPING","time":"2015-11-17T13:18:25.121Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":40,"msg":"161: option_set field not string or array, SKIPPING","time":"2015-11-17T13:18:25.121Z","v":0}
{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":30,"msg":"indexing 161","time":"2015-11-17T13:18:25.121Z","v":0}

{"name":"search-index","hostname":"McSmoothie.local","pid":37380,"component":"indexer","level":30,"msg":"batch indexed!","time":"2015-11-17T13:23:13.430Z","v":0}
Product 161 index updated

A good 5 minutes to add a document... doesn't sound quite right :)

cleverplatypus commented 9 years ago

Here are the settings, just in case (I'll send you the test case tomorrow, thanks for offering help)

var batchOptions = {
    batchName: 'default',
    fieldOptions: [{
        filter: true,
        fieldName: 'meta_description',
        weight: 4
    }, {
        filter: true,
        fieldName: 'description',
        weight: 3
    }, {
        filter: true,
        fieldName: 'meta_keywords',
        weight: 2
    }, {
        filter: true,
        fieldName: 'name',
        weight: 1
    },
{
        filter: true,
        fieldName: 'id',
        weight: 1000
    }
    ]
};

module.exports = require('search-index')({
    deletable: true,
    fieldedSearch: true,
    indexPath: process.env.OPENSHIFT_DATA_DIR + '/runtime/indexes/products',
    logLevel: 'info',
    stopwords: ['di', 'a', 'da', 'in', 'con', 'su', 'per', 'tra', 'fra', 'il', 'lo', 'la', 'i', 'gli', 'le', 'con'],
    fieldsToStore: ['id', 'name', 'description', 'metadata', 'meta_description', 'meta_keywords', 'thumbnail_url', 'url']
});

And a typical document (attached) which is around 100k but given that the fields to index are very few.. it's really just a few Kbs, unless I'm missing something.

heydokoala-376-export.json.txt

cleverplatypus commented 9 years ago

Hi Fergie, I sent you an email with a complete test case.

cleverplatypus commented 9 years ago

Update. I reverted search-index to 0.6.10. I'm now able to index my docs (it takes on average 7 seconds per doc). However, when I try to replace a document I get this:

Thu, 19 Nov 2015 03:38:44 GMT uncaughtException Cannot call method 'filter' of undefined
TypeError: Cannot call method 'filter' of undefined
    at Transaction.<anonymous> (/var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/lib/indexing/deleter.js:51:30)
    at next (/var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/node_modules/ginga/index.js:123:32)
    at /var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/index.js:93:25
    at Lock.acquire (/var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/node_modules/2pl/index.js:33:30)
    at /var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/index.js:90:18
    at /var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/index.js:207:5
    at Object._onImmediate (/var/lib/openshift/552b950ce0b8cdc0a20000d6/app-root/runtime/repo/node_modules/search-index/node_modules/level-transactions/node_modules/async-depth-first/index.js:17:5)
    at processImmediate [as _immediateCallback] (timers.js:354:15)
fergiemcdowall commented 8 years ago

There have been a few issues opened about performance in specific situations. Im going to close all of them. What is needed are some open performance benchmarks that can be hammered on by anybody that is interested in the issue.