elastic / elasticsearch-perl

Official Perl low-level client for Elasticsearch.
https://metacpan.org/pod/Search::Elasticsearch
Apache License 2.0
4 stars 56 forks source link

Scroll contexts not deleted for searches with no resuts #172

Open Daniel314 opened 5 years ago

Daniel314 commented 5 years ago

The perl client doesn't delete scroll contexts when a scroll search is empty (no results returned). However, if data is returned, then a call to finish() does delete the scroll context.

The lack of a DELETE issued for the scroll context at the end of the code below was verified using ngrep, and by checking the status of ES with a variant of the following: curl -q -sS -X GET http://localhost:9200/_nodes/stats/indices/search?pretty | egrep '(name|scroll_current)' I have the following modules installed: Search::Elasticsearch (6.00) Search::Elasticsearch::Client::6_0 (6.00) Search::Elasticsearch::Client::5_0 (5.02) Search::Elasticsearch::Client::Direct (1.20)

Tested with the following code:

use strict;

use Search::Elasticsearch;

my $es = Search::Elasticsearch->new( nodes => 'mysearchnode:9200' ) || die;

my $scroll = $es->scroll_helper (
  'index' => 'logstash-2019.07.03',
  'body' => {
    'query' => {
      'query_string' => {
        'query' => 'IpAddress: SomethingThatNeverMatches',
        'analyze_wildcard' => 'false'
      }
    }
  }
);

my $doc = $scroll->next;
printf STDERR "Doc returned? %s\n", (defined $doc) ? 'yes' : 'no';
$scroll->finish();
ezimuel commented 4 years ago

@Daniel314 first of all, sorry for the delay. I'm the new maintainer of this project, thanks for reporting this I'll looking into it soon.

fy2 commented 3 years ago

Hi, I observe this problem as well. Has anyone found out why this is happening?

ES version: 7.6 Search::Elasticsearch version 6.80

fy2 commented 3 years ago

Hi @Daniel314 & @ezimuel ,

as stated in my comment above, I was having this problem as well. It does happen when no hits are found. However, I realised that this also happens after I loop through all the hits in a scrolled search that returns small number of hits (3 hits in my case). After looping through the hits via with while (my $hit = $scroll->next) {} and calling $scroll->finish, the scroll context would still not be deleted immediately, they were still open for a few minutes (default time).

I have looked at your perl libraries and made these changes, and it seems to have fixed both issues for me:

diff --git a/lib/Search/Elasticsearch/Client/6_0/Role/Scroll.pm b/lib/Search/Elasticsearch/Client/6_0/Role/Scroll.pm
index ce013e6..7425adf 100644
--- a/lib/Search/Elasticsearch/Client/6_0/Role/Scroll.pm
+++ b/lib/Search/Elasticsearch/Client/6_0/Role/Scroll.pm
@@ -19,6 +19,7 @@ has 'search_params' => ( is => 'ro' );
 has 'is_finished'   => ( is => 'rwp', default => '' );
 has '_pid'          => ( is => 'ro', default => sub {$$} );
 has '_scroll_id'    => ( is => 'rwp', clearer => 1, predicate => 1 );
+has '_prime_scroll_id' => ( is => 'ro' );

 #===================================
 sub scroll_request {
diff --git a/lib/Search/Elasticsearch/Client/6_0/Scroll.pm b/lib/Search/Elasticsearch/Client/6_0/Scroll.pm
index f35e71f..707af46 100644
--- a/lib/Search/Elasticsearch/Client/6_0/Scroll.pm
+++ b/lib/Search/Elasticsearch/Client/6_0/Scroll.pm
@@ -42,7 +42,11 @@ sub BUILDARGS {
         _buffer      => $results->{hits}{hits},
         $total
         ? ( _scroll_id => $results->{_scroll_id} )
-        : ( is_finished => 1 )
+        : ( is_finished => 1 ),
+
+        # Store the very first scroll id here so that even if $total == 0, we
+        # can still keep track of it:
+        _prime_scroll_id => $results->{_scroll_id},
     };
 }

@@ -51,6 +55,17 @@ sub next {
 #===================================
     my ( $self, $n ) = @_;
     $n ||= 1;
+
+    # Looks like there were no search results and this is the first call to
+    # 'next' (because '$self->total == 0'), We set the _scroll_id to the value
+    # of prime scroll id. This will help us delete the scroll context via the
+    # call to finish():
+    if ($self->total == 0) {
+        $self->_set__scroll_id($self->_prime_scroll_id);
+        $self->finish();
+        return;
+    }
+
     while ( $self->_has_scroll_id and $self->buffer_size < $n ) {
         $self->refill_buffer;
     }
@@ -86,10 +101,7 @@ sub refill_buffer {
     my $hits = $results->{hits}{hits};
     $self->_set_total_took( $self->total_took + $results->{took} );

-    if ( @$hits == 0 ) {
-        $self->_clear_scroll_id;
-    }
-    else {
+    if ( @$hits ) {
         $self->_set__scroll_id( $results->{_scroll_id} );
         push @$buffer, @$hits;
     }
@@ -101,7 +113,7 @@ sub refill_buffer {
 sub finish {
 #===================================
     my $self = shift;
-    return if $self->is_finished || $self->_pid != $$;
+    return if ($self->is_finished && !$self->_scroll_id) || ($self->_pid != $$);

     $self->_set_is_finished(1);
     @{ $self->_buffer } = ();

I hope this helps you or anyone else having this issue. If I am making a mistake in the proposed fix, please let me know as my initial, rudimentary checks do not show an obvious adverse effects.