KalobTaulien / wagtail-algolia-example-code

Sample files for your Wagtail setup using Algolia search.
3 stars 1 forks source link

Problem with large pages #1

Open yhoiseth opened 4 years ago

yhoiseth commented 4 years ago

Hi,

Thanks a lot for sharing. I encountered an issue that I suspect others will encounter, too.

Algolia has—at least to me—surprisingly small limits on record sizes:

  • 10 KB for Pro, Starter, or Free accounts
  • 20 KB for legacy (Essential and Plus)

It appears that for Enterprise accounts, you can have larger limits.

In practice, this means that if you e.g. have a page with a body, title, search_description, etc. of more than 10 000 characters put together (which is quite common), you'll encounter an error when indexing.

Algolia's official solution to this problem is splitting records, e.g. by paragraph.

I inquired with Algolia's support team how to do this with the Django integration. They answered that they don't think it's possible.

My assessment is that, in order to index large pages in Wagtail/Django, we would need to split records and build the index using Algolia's generic Python client. That, however, seems like a hack and more trouble than it's worth to us. (I'd be very interested in knowing if anyone has done this or has found a different solution.)

Also, regarding the blog post Using Algolia Search with Wagtail, I think it would be useful to add a warning about Algolia's record size limits. With such a warning, I probably wouldn't have spent any time trying to implement Algolia. It could, for example, say something like:

Warning: Be aware that Algolia at the time of writing has a 10 KB limit on record sizes for all new accounts except Enterprise. This means that indexing any page with more than 10 000 characters of text will fail. There is an official solution to this issue, but it doesn't work with the Django integration.

TomKlotzPro commented 4 years ago

Hello @yhoiseth ! I'm Tom and I work at Algolia, I used to work on laravel integration, we have a Splitter that helps us splitting large records. But I don't know if it's possible to do it with Django. I'll try to dig into it and try to implement it.

This is our documentation for our splitter : https://www.algolia.com/doc/framework-integration/laravel/advanced-use-cases/split-large-records/?language=php

And this is the repo of our laravel integration : https://github.com/algolia/scout-extended

Cheers

yhoiseth commented 4 years ago

That's great, I really appreciate it ❤

Let me know if you need to talk things through or something. (Just be aware that I wouldn't consider myself a Django expert 😉)

yhoiseth commented 4 years ago

For now, we have worked around this issue by making a similar solution using Elasticsearch and Bootstrap Autocomplete. I'm sharing how to achieve it here in case anyone else runs into a similar problem.

Caveats and prerequisites

Demo

See the search box in the navbar on https://www.entrepedia.com/.

How

Set up Elasticsearch

See Backends — Wagtail Documentation. (The other backends don't work as well because they don't return results until words are almost written out. Elasticsearch can return results when the query is as little as one character.)

Set up search endpoint

When you start a Wagtail project, it sets up a default search endpoint.

In order to make it work with Bootstrap Autocomplete, you need to change query to q. Do this in the view and the template if you plan on having a graceful fallback in case the JavaScript breaks.

Next, you need to return the search results as JSON if the autocomplete is doing the searching. There are many ways to do this. A slightly ugly but functional way is to add the following guard above the existing return statement:

    from json import dumps
    from django.http import HttpResponse
    # …
    if request.is_ajax():
        data = []
        for hit in search_results:
            data.append({"url": hit.url, "text": hit.title})
        return HttpResponse(dumps(data), content_type="application/json")

Set up frontend

For the frontend, we need a search field and some JavaScript. These are the relevant parts:

<input
  aria-label="Search"
  autocomplete="off"
  class="form-control"
  id="search-input"
  name="q"
  type="search"
>
<script
  src="https://cdn.jsdelivr.net/gh/xcash/bootstrap-autocomplete@v2.2.2/dist/latest/bootstrap-autocomplete.min.js"
></script>
<script>
  $(document).ready(function () {
    var $searchInput = $("#search-input");
    $searchInput.autoComplete({
      minLength: 1,
      resolverSettings: {
        url: "{% url "search" %}"
      }
    });
    $searchInput.on("autocomplete.select", function (event, item) {
      window.location.pathname = item.url;
    });
    $searchInput.on("keydown", function (event) {
      if (event.keyCode === 13) {
        return false; // Do not submit form on ENTER
      }
    });
  });
</script>

CC https://github.com/algolia/algoliasearch-django/issues/285

yhoiseth commented 4 years ago

Hi @KalobTaulien,

Just a heads-up in case you didn't notice this 🙂

TLDR: You might want to add some info about the size limit to your blog post as a courtesy to readers.