Search web service - Githubissues

curiousdannii commented 8 years ago

If I understood correctly, the plan for the stand alone website was for it to do searches off a copy of the index the user has downloaded?

Well the size of all six indexes gzipped is 1.4Mb, which is rather large. Not too large that it couldn't work, but we'd definitely want to lazy load it so that random visitors to the page don't need to download it.

But it also would be very easy I expect to design a simple node server to do searches over the web.

alerque commented 8 years ago

Yes, I would expect usage on the stand alone site to require a copy of the json downloaded to the browser. I figured whatever search code we come up with on the server side would be packaged up (using browserify?) and sent over as a bundle. Each index could lazy-load the first time they run a search or based on what sites are selected to be in the search set.

Basically the static site would function the same way we expect the user-script to, but you wouldn't need to install it to use it, just hit up the URL. The userscript would be for the more avid and regular users so they wouldn't have to leave the site they were on.

I don't think 1.4 MB is too much to expect users of the site to download up front. Even with all the JS bundled up I think the site footprint over the wire will be in the ~2mb range — which is heavy if you are trying to sell something but not too bad for a full app.

alerque commented 8 years ago

P.S. Of course it should load the UI up front and the data in the background so with any luck by the time they type in a query or figure out what to click we'll be ready for them. And we'll want to obey content modified headers because in looks like Github serves them up correctly (and gzips over the wire too so we don't have to).

curiousdannii commented 8 years ago

Do any of browserify/webpack/rollup support lazy loading? I only have the smallest knowledge of them.

I was going to suggest that we should add a cache manifest, but apparently that is now depreciated!

We should still filter out the index properties which the search doesn't rely on, that would reduce the size noticeably.

alerque commented 8 years ago

Yes, filtering the index down to just keys that get used in the actual search will compress it a lot. The numbers we're looking at are for the whole pretty printed version bloated with un-used fields. I think it's important that we supply ready make builds of that data too for other people to play with in their own projects, but when we actually roll up a version for our search to consume we can pack it down a lot.

By "lazy loading" do you mean asynchronous loading of JS module resources or just whether we'll be able to load the json data in the background after the page renders? The latter is yes, but it doesn't depend on any of the above. The former is yes, I think all of them do but I'm not sure it's important. Keeping in mind that the dev path here is Static Site → Userscript → Browser Extensions, the most important thing is that we can bundle the whole script into a single resource that is pre-loaded into browsers. Async loading isn't very important for the eventual target here and, as far as JS resources go on the static site, I doubt a modular approach will to loading the page rather th an a roll-up of JS resources would make for any appreciable difference in performance. The heaviest part of the whole thing is likely to be the BCV parser, but we need that pretty much out of the gate to parse search input.

curiousdannii commented 8 years ago

It would make sense to only load the data files once the user has interacted with something on the site (starting typing, selected a site to search from etc), because there's no need to waste the bandwidth of the people who will inevitably accidentally click on a link, especially if they're using a mobile connection etc. Loading the data is easy with vanilla JS, but if the bundlers provide a more integrated system then I was concerned it could potentially be not part of their usual workflow. But now that I think about it, that's a silly concern, because most apps would need to load data.

The rest of the client code should be bundled into one script (except jQuery which we can load from a cdn.) The BCV parser is only 25Kb gzipped which is tiny.

alerque / stack-verse-mapper

Search web service #18