Open jonathan-s opened 5 years ago
thanks @jonathan-s for collecting those search improvements and bundling them here.
looking at the source code of theoretically visible text it picked up "jump to search".
We indeed could work on some better ranking. We already collect some interaction metadata like visit frequency, time stayed and scroll % and could already on that improve some things.
Further we are working on getting storex, the underlying storage layer into the local system so we can make use of more sophisticated search tech, maybe not even based on JS.
Do you have any ideas for quick improvements that we could add that would drastically improve the result for your search queries?
I haven't had a deep dive into the code. But I wouldn't be too surprised if you could drastically improve accuracy if you implemented some weighting into to the search. Keywords that are mentioned more often could for instance signify that it is a more relevant term for the document.
Also the it does not seem to exclude documents that don't contain a keyword when you search for "term1 term2". Ie document 1 contains both terms, document 2 only contains one of the terms. Therefore document 2 should be excluded.
Also the it does not seem to exclude documents that don't contain a
It should though. In cases like in the OP its because the keyword is in the HTML, but hidden in the initial view. Therefore it happens that such terms are indexed too and cause some noise then. Or do you have other instances where that is not the case?
Another example of weird indexing which seems to then get reflected into search. https://twitter.com/41Strange/status/1070040073482067969 for some reason "cow" is being indexed for that link. I can't find it in the source even.
On the same theme -> https://news.ycombinator.com/item?id=18128477 Lots of content. The word cow
occurs exactly once in this text. Perhaps it should not be indexed? In either way not all words in that page carry equal weight. cow
is certainly not on the top of the list of words I would associate with that page.
Just wondering, has there been any updates to the ranking/searching/indexing accuracy?
No, actually we recently decided to pivot on this feature a bit to postpone the need for structural improvements. The search is quite a heavy lift for the application and has so many things it makes more difficult: Backup & sync size/performance/reliability, search performance, running costs.
We realised that search is not our main value proposition with which we can really move the needle for the company to become profitable/sustainable. It's sharing/collaborating and integrating into existing workflows with tools like Roam, Notion, Evernote etc. Search will serve that purpose and will still exist, but in its current form has consumed too much of our resources in the past 2 years in order to be deemed viable to be the main value proposition.
Our decision therefore was, for the time being until we have the resources, to limit search only to pages that have been actively bookmarked, tagged, listed or annotated.
List of issues related to search.
Would say improving the accuracy of search is really important. As an example I've got this wikipedia article in the db.
https://en.wikipedia.org/wiki/B_Corporation_(certification)
In memex I search for "weighting search". That article shows up. There are three mentions of weighting. No mention of search (does it pick up "search wikipedia"?). Nonetheless "weighting search" implies AND not OR, so the article shouldn't show up in the first place.