elastic / go-elasticsearch

The official Go client for Elasticsearch
https://github.com/elastic/go-elasticsearch#go-elasticsearch
Apache License 2.0
5.54k stars 609 forks source link

Bad scrolling performance #881

Open hotrush opened 2 weeks ago

hotrush commented 2 weeks ago

Hello. We recently upgraded our cluster from v7 to v8 and had to migrate from olivere/elastic package to this official client at some service.

But after that we met serious performance degradation, service started responding 6-8 times longer than with client we used before (after reverting to olivere/elastic degradation is gone so this is confirmed).

Our code is pretty simple, we just scroll searches in goroutine and push data to a channel, see code below:

this is how client created

c, err := elasticsearch.NewTypedClient(elasticsearch.Config{
    Addresses: []string{makeEsURL()},
    Username:  cfg.EsUser,
    Password:  cfg.EsPass,
})

this is how we build initial query

esResults := make(chan esResult, 1)
...
scroll := es.Search().
    Index(r.GetIndex()).
    Raw(strings.NewReader(r.GetQuery())).
    Size(cfg.EsScrollSize).
    Scroll("15s")

err := scrollEsQuery(ctx, scroll, esResults)

this is how it is processed

func scrollEsQuery(ctx context.Context, scroll *search.Search, esResults chan<- esResult) error {
    var scrollID string
    res, err := scroll.Do(ctx)
    if err != nil {
        return err
    }
    if len(res.Hits.Hits) == 0 {
        return nil
    }

    esResults <- newEsResult(res.Hits)

    if res.ScrollId_ != nil {
        scrollID = *res.ScrollId_
    }

    defer func() {
        if scrollID != "" {
            es.ClearScroll().ScrollId(scrollID).Do(gCtx)
        }
    }()

    for {
        res, err := es.Scroll().ScrollId(scrollID).Scroll("15s").Do(ctx)
        if err != nil {
            return err // something went wrong
        }
        if res.ScrollId_ != nil {
            scrollID = *res.ScrollId_
        }
        if len(res.Hits.Hits) == 0 {
            return nil
        }

        select {
        case <-ctx.Done():
            return gCtx.Err()
        case esResults <- newEsResult(res.Hits):
        }
    }
}

Code is pretty simple and i can't understand what can cause such a big performance difference, am I doing anything wrong?

Anaethelion commented 3 days ago

Hi @hotrush

I've tested your snippet and something comes to my attention, you are passing a raw request and setting the size at the same time. Size is trying to be set in the body but fails because raw takes precedence. Hence size is not properly propagated to the body of the request and you are effectively returning the full match of your request on the first call.

Can you check this assumption? If that is so I'll work on a way of preventing that.