amyxzhang / lucy.js

A full-text search engine in the browser
22 stars 3 forks source link

use inverted index on smaller dataset for now #3

Open amyxzhang opened 9 years ago

amyxzhang commented 9 years ago

it is currently very slow to create the index on the current tweets.json. I would try it with a subset. Also the index takes a while to show up in the Chrome Dev Console, even when I refreshed it.

amyxzhang commented 9 years ago

ah, for some reason, refreshing while within the console doesn't actually refresh the view, but when I closed the console and reopened it, it showed up immediately.

LeaVerou commented 9 years ago

Keep in mind that console.log()-ing every single item is probably making this way slower than it should.

amyxzhang commented 9 years ago

oh yeah that was just for my debugging purposes. when i'm finished, I'll try again with the big dataset.

LeaVerou commented 9 years ago

Oh, you’re making changes to invindex.js right now?

amyxzhang commented 9 years ago

oh no, I was done for the night :) I will work on it tomorrow.

On Wed, Nov 26, 2014 at 2:13 AM, Lea Verou notifications@github.com wrote:

Oh, you’re making changes to invindex.js right now?

— Reply to this email directly or view it on GitHub https://github.com/amyxzhang/lucy.js/issues/3#issuecomment-64523486.

Amy X. Zhang | http://amyxz.com | @amyxzh

LeaVerou commented 9 years ago

@manalinaik Is it normal that generating the prefix tree took 14 minutes here for 1/3 of the dataset of the repo? (I forgot how long it used to take, but I don't recall it taking that long before)

manalinaik commented 9 years ago

Yeah, it's gotten significantly slower. Inserting a large number of tweets asynchronously resulted in a lot of insertion errors because different threads would try to insert the same node in the prefix tree concurrently. And after an insertion error on a given key, subsequent calls to get on that key would also fail. I had to change the code to insert every tweet one at a time, which is really slow. But I couldn't find another way to avoid all the insertion errors without doing so.