speedup search - Githubissues

azimut / newspod

fully searchable db of podcasts that I am interested on

https://azimut.github.io/newspod/

MIT License

0 stars 0 forks source link

speedup search #28

Open azimut opened 5 months ago

azimut commented 5 months ago

[x] ~try thread pool provided by wasm library~ If you use only a single connection, there is no difference between the two versions.
[ ] see if I can return a list of promises, instead of a promise of a list, so I can maybe show search results as I receive them
[ ] add pagination

azimut commented 5 months ago

Searching "Haskell" today ends up in 2.5Mb downloaded. Which includes in a LOT of non requested fields (content? description?). I guess it does scanning.

azimut commented 5 months ago

sqlite> SELECT entries.feedid,
       search.entriesid,
       search.title,
       entries.url,
       entries.datemillis
FROM search
JOIN entries ON search.entriesid=entries.id
WHERE search.content MATCH "Haskell"
ORDER BY entries.datemillis ASC;
QUERY PLAN
|--SCAN search VIRTUAL TABLE INDEX 0:M3
|--SEARCH entries USING INTEGER PRIMARY KEY (rowid=?)
`--USE TEMP B-TREE FOR ORDER BY
...

azimut commented 5 months ago

Dropping the description column from entries reduces a bit the size.

Before

Ocaml 514.05 kB / 594.38 kB transferred Haskell 2.17 MB / 2.33 MB transferred

After

Ocaml 505.86 kB / 584.33 kB transferred Haskell 2.07 MB / 2.22 MB transferred

azimut commented 5 months ago

Split entries.content into a different table should help.

Almost for sure it would help on the load time of "list episodes". Because right now is also fetching the whole records. But, I don't know how it will behave with the multiple index lookup would work.1

Another alternative would be add an index to entries.title. This would create a covering index2 which will limit the search to only the index. Which is what happens when I list the feeds and count the number of entries per feed.

sqlite> SELECT feeds.id, feeds.title, count(*)
            FROM feeds JOIN entries ON feeds.id=entries.feedid
        GROUP BY entries.feedid
          HAVING count(*) > 0;
QUERY PLAN
|--SCAN entries USING COVERING INDEX entriesindex
`--SEARCH feeds USING INTEGER PRIMARY KEY (rowid=?)

azimut commented 5 months ago

Tried adding 3+ ~indexes~ columns to the index of entries. Makes it download 1MB just to show the home page (aka getFeeds). 🙃 No idea why. I see .mp3 urls....oh, might be he gets the whole index. Meaning, it gets the 3 fields.

azimut commented 5 months ago

well, I think this is done for now. It still reads the content....but size seems to have decreased

along with a for sure speed up of display of entries of a feed

azimut commented 5 months ago

leaving open for 2nd task of list of promises