ViaQ / integration-tests

1 stars 5 forks source link

Missing records from project logs #6

Open lukas-vlcek opened 8 years ago

lukas-vlcek commented 8 years ago

Summary

Missing log messages in Elasticsearch indices when running openshift-test.sh script.

Details

Note: the fix from PR #5 needs to be applied/merged first.

Assume the following ENV variables:

export USE_FLUENTD=true
export NMESSAGES=110
export NPROJECTS=10

When the test openshift-test.sh is executed it fails (time-outs) on verification of expected records in the index for the last project (i.e. project-09). Specifically:

error: 110 messages not found in this-is-project-09

Further investigation reveals that this index is missing some records (note the index this-is-project-09.1.2016.06.21 containing only 88 documents instead of 110):

# curl http://10.40.2.198:9200/_cat/indices?v
health status index                           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   this-is-project-07.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-09.1.2016.06.21   5   1         88            0     42.6kb         42.6kb 
yellow open   .operations.2016.06.21            5   1        110            0     37.9kb         37.9kb 
yellow open   this-is-project-10.1.2016.06.21   5   1        110            0     53.2kb         53.2kb 
yellow open   this-is-project-06.1.2016.06.21   5   1        110            0     52.9kb         52.9kb 
yellow open   this-is-project-08.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-04.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-03.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-02.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 
yellow open   this-is-project-01.1.2016.06.21   5   1        110            0     53.1kb         53.1kb 
yellow open   this-is-project-05.1.2016.06.21   5   1        110            0     52.6kb         52.6kb 

Note the size of source log files in the /tmp/tmp.pKiGhuIsLw/data/docker folder below, all are equal (which is expected and it means they all contain the same number of log messages):

+ ls -R -alrtF /tmp/tmp.pKiGhuIsLw
/tmp/tmp.pKiGhuIsLw:
total 240
drwx------.  4 root root    100 Jun 21 13:47 ./
-rw-r--r--.  1 root root 243210 Jun 21 13:47 orig
drwxr-xr-x.  2 root root     80 Jun 21 13:47 config/
drwxr-xr-x.  3 root root    120 Jun 21 13:47 data/
drwxrwxrwt. 13 root root    280 Jun 21 13:50 ../

/tmp/tmp.pKiGhuIsLw/config:
total 12
drwx------. 4 root root  100 Jun 21 13:47 ../
-rw-r--r--. 1 root root  583 Jun 21 13:47 syslog-input.conf
-rw-r--r--. 1 root root 4582 Jun 21 13:47 fluent.conf
drwxr-xr-x. 2 root root   80 Jun 21 13:47 ./

/tmp/tmp.pKiGhuIsLw/data:
total 40
drwx------. 4 root root   100 Jun 21 13:47 ../
drwxr-xr-x. 2 root root   240 Jun 21 13:47 docker/
-rw-r--r--. 1 root root 30030 Jun 21 13:47 messages
-rw-r--r--. 1 root root    52 Jun 21 13:47 system.pos
drwxr-xr-x. 3 root root   120 Jun 21 13:47 ./
-rw-r--r--. 1 root root  1740 Jun 21 13:47 es-containers.log.pos

/tmp/tmp.pKiGhuIsLw/data/docker:
total 320
drwxr-xr-x. 2 root root   240 Jun 21 13:47 ./
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-01_this-is-project-01_this-is-container-01-4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-02_this-is-project-02_this-is-container-02-53c234e5e8472b6ac51c1ae1cab3fe06fad053beb8ebfd8977b010655bfdd3c3.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-03_this-is-project-03_this-is-container-03-1121cfccd5913f0a63fec40a6ffd44ea64f9dc135c66634ba001d10bcf4302a2.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-04_this-is-project-04_this-is-container-04-7de1555df0c2700329e815b93b32c571c3ea54dc967b89e81ab73b9972b72d1d.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-05_this-is-project-05_this-is-container-05-f0b5c2c2211c8d67ed15e75e656c7862d086e9245420892a7de62cd9ec582a06.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-06_this-is-project-06_this-is-container-06-06e9d52c1720fca412803e3b07c4b228ff113e303f4c7ab94665319d832bbfb7.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-07_this-is-project-07_this-is-container-07-10159baf262b43a92d95db59dae1f72c645127301661e0a3ce4e38b295a97c58.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-08_this-is-project-08_this-is-container-08-aa67a169b0bba217aa0aa88a65346920c84c42447c36ba5f7ea65f422c1fe5d8.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-09_this-is-project-09_this-is-container-09-2e6d31a5983a91251bfae5aefa1c0a19d8ba3cf601d0e8a706b4cfa9661a6b8a.log
-rw-r--r--. 1 root root 29810 Jun 21 13:47 this-is-pod-10_this-is-project-10_this-is-container-10-917df3320d778ddbaa5c5c7742bc4046bf803c36ed2b050f30844ed206783469.log
drwxr-xr-x. 3 root root   120 Jun 21 13:47 ../

Further we can see that there was possible issue in pushing the data to Elasticsearch (there was 1 rejected bulk request - bulk.rejected):

# curl http://10.40.2.198:9200/_cat/thread_pool?v
host         ip         bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
15dfd31fb381 172.17.0.2           0          0             1            0           0              0             0            0               0 

However, Elasticsearch log does not show any errors. We can see that after the cluster is started expected indices were created and mapping was updated as documents are indexed. That is all:

# docker logs --tail=2000 15dfd31fb381
CLUSTER_NAME not set - using viaq
+ set -e
+ PORT=9200
+ HOST=localhost
+ LOG_FILE=elasticsearch_connect_log.txt
+ RETRY_COUNT=30
+ RETRY_INTERVAL=1
+ retry=30
+ max_time=30
+ timeouted=false
+ '[' -z '' ']'
+ echo CLUSTER_NAME not set - using viaq
+ export CLUSTER_NAME=viaq
+ CLUSTER_NAME=viaq
+ mkdir -p /elasticsearch/viaq
+ '[' -n '' ']'
+ regex='^([[:digit:]]+)([GgMm])$'
+ [[ 8G =~ ^([[:digit:]]+)([GgMm])$ ]]
+ num=8
+ unit=G
+ [[ G =~ [Gg] ]]
+ (( num = num * 1024 ))
+ [[ 8192 -lt 512 ]]
+ ES_JAVA_OPTS=' -Xms256M -Xmx4096m'
+ /usr/share/elasticsearch/bin/elasticsearch
+ add_index_template
+ wait_for_port_open
+ rm -f elasticsearch_connect_log.txt
+ echo -n 'Checking if Elasticsearch is ready on localhost:9200 '
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
Checking if Elasticsearch is ready on localhost:9200 .[2016-06-21 11:47:32,270][INFO ][node                     ] [Mister Sensitive] version[1.5.2], pid[11], build[62ff986/2015-04-27T09:21:06Z]
[2016-06-21 11:47:32,271][INFO ][node                     ] [Mister Sensitive] initializing ...
[2016-06-21 11:47:32,275][INFO ][plugins                  ] [Mister Sensitive] loaded [], sites []
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
..[2016-06-21 11:47:34,351][INFO ][node                     ] [Mister Sensitive] initialized
[2016-06-21 11:47:34,353][INFO ][node                     ] [Mister Sensitive] starting ...
[2016-06-21 11:47:34,487][INFO ][transport                ] [Mister Sensitive] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.17.0.2:9300]}
[2016-06-21 11:47:34,499][INFO ][discovery                ] [Mister Sensitive] elasticsearch/ABy2trHPR2-4ve1jp0SiEQ
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' false == false ']'
+ echo -n .
+ sleep 1
...[2016-06-21 11:47:37,520][INFO ][cluster.service          ] [Mister Sensitive] new_master [Mister Sensitive][ABy2trHPR2-4ve1jp0SiEQ][15dfd31fb381][inet[/172.17.0.2:9300]], reason: zen-disco-join (elected_as_master)
[2016-06-21 11:47:37,545][INFO ][http                     ] [Mister Sensitive] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.2:9200]}
[2016-06-21 11:47:37,545][INFO ][node                     ] [Mister Sensitive] started
[2016-06-21 11:47:37,595][INFO ][gateway                  ] [Mister Sensitive] recovered [0] indices into cluster_state
+ ((  retry -= 1  ))
+ ((  retry == 0  ))
+ curl -s --max-time 30 -o elasticsearch_connect_log.txt localhost:9200
+ '[' -f elasticsearch_connect_log.txt ']'
+ grep -q 200 elasticsearch_connect_log.txt
+ echo '- connection successful'
- connection successful
+ curl -v -X GET 'http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=30s'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
> GET /_cluster/health?wait_for_status=yellow&timeout=30s HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9200
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 252
< 
{ [data not shown]
100   252  100   252    0     0  15247      0 --:--:-- --:--:-- --:--:-- 14823
* Connection #0 to host localhost left intact
+ curl -v -X PUT -d@/usr/share/elasticsearch/config/com.redhat.viaq-template.json http://localhost:9200/_template/viaq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* About to connect() to localhost port 9200 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 9200 (#0)
> PUT /_template/viaq HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9200
> Accept: */*
> Content-Length: 7320
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
} [data not shown]
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 21
< 
{ [data not shown]
100  7341  100    21  100  7320    246  86075 --:--:-- --:--:-- --:--:-- 87142
* Connection #0 to host localhost left intact
{"cluster_name":"elasticsearch","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":0,"active_shards":0,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"number_of_pending_tasks":0}{"acknowledged":true}[2016-06-21 11:47:49,686][INFO ][cluster.metadata         ] [Mister Sensitive] [.operations.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:49,881][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-02.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:49,963][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-07.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,030][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-01.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,088][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-10.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,134][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-05.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,184][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-06.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,243][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-03.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,293][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-08.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,374][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-09.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:50,432][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-04.1.2016.06.21] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [fluentd]
[2016-06-21 11:47:53,609][INFO ][cluster.metadata         ] [Mister Sensitive] [.operations.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,695][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-05.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,789][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-01.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,868][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-06.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,933][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-02.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:53,981][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-07.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,035][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-08.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,084][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-03.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,146][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-10.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,148][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-04.1.2016.06.21] update_mapping [fluentd] (dynamic)
[2016-06-21 11:47:54,252][INFO ][cluster.metadata         ] [Mister Sensitive] [this-is-project-09.1.2016.06.21] update_mapping [fluentd] (dynamic)
lukas-vlcek commented 8 years ago

Situation seem to be getting much worse with increasing number of files the Fluentd needs to tail.

For example if we use the following ENV variable for the test:

export NMESSAGES=20
export NPROJECTS=99

(^^ this means Fluentd will be tailing 100 log files each containing only 20 records)

We end up with the following indices state:

# curl http://10.40.2.198:9200/_cat/indices?v
health status index                           pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   this-is-project-84.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-93.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-02.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-95.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-29.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-27.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-83.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-19.1.2016.06.21   5   1          3            0      6.7kb          6.7kb 
yellow open   this-is-project-97.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-20.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-86.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-82.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-35.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-45.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-78.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-96.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-30.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-64.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-34.1.2016.06.21   5   1         20            0     32.6kb         32.6kb 
yellow open   this-is-project-36.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-17.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-04.1.2016.06.21   5   1         12            0     19.7kb         19.7kb 
yellow open   this-is-project-01.1.2016.06.21   5   1         13            0       20kb           20kb 
yellow open   this-is-project-03.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-18.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-31.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-49.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-74.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-32.1.2016.06.21   5   1          8            0     13.3kb         13.3kb 
yellow open   this-is-project-81.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-89.1.2016.06.21   5   1          3            0      6.7kb          6.7kb 
yellow open   this-is-project-55.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-14.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-54.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-24.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-63.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-13.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-58.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-47.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-79.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-50.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-65.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-91.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-11.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-40.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-61.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-08.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-06.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-57.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-51.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-37.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-66.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-69.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-12.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-41.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-77.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-33.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-15.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-09.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-23.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-44.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-72.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-21.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-07.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-25.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-48.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-98.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-68.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-43.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-22.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-85.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-76.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-94.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-59.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-70.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-05.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-38.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-90.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-60.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   .operations.2016.06.21            5   1         20            0     23.8kb         23.8kb 
yellow open   this-is-project-28.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-16.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-88.1.2016.06.21   5   1          5            0      7.1kb          7.1kb 
yellow open   this-is-project-75.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-53.1.2016.06.21   5   1         20            0     32.5kb         32.5kb 
yellow open   this-is-project-73.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-71.1.2016.06.21   5   1         20            0     32.6kb         32.6kb 
yellow open   this-is-project-92.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-42.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-67.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-52.1.2016.06.21   5   1         16            0     26.1kb         26.1kb 
yellow open   this-is-project-26.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-46.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-80.1.2016.06.21   5   1          9            0     13.6kb         13.6kb 
yellow open   this-is-project-39.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-99.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-56.1.2016.06.21   5   1          4            0      6.9kb          6.9kb 
yellow open   this-is-project-87.1.2016.06.21   5   1          0            0       575b           575b 
yellow open   this-is-project-62.1.2016.06.21   5   1         11            0     19.5kb         19.5kb 
yellow open   this-is-project-10.1.2016.06.21   5   1          0            0       575b           575b 

Only few indices has some documents, just few have expected 20 documents.

We are getting 429 (!) bulk.rejected:

# curl http://10.40.2.198:9200/_cat/thread_pool?v
host         ip         bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
c681d5122401 172.17.0.2           0          0           429            0           0              0             0            0               0          

Just for the record if we investigate Fluentd plugin status (via REST API, which is subject of PR #4) we can see that elasticsearch output does not hold any data in bufferes:

      {  
         "plugin_id":"object:1099830",
         "plugin_category":"output",
         "type":"elasticsearch_dynamic",
         "config":{  
            "@type":"elasticsearch_dynamic",
            "host":"viaq-elasticsearch",
            "port":"9200",
            "scheme":"http",
            "index_name":"${record['kubernetes_namespace_name']}.${record['kubernetes_namespace_id']}.${Time.at(time).getutc.strftime(@logstash_dateformat)}",
            "client_key":"",
            "client_cert":"",
            "ca_file":"",
            "flush_interval":"5s",
            "max_retry_wait":"300",
            "disable_retry_limit":""
         },
         "output_plugin":true,
         "buffer_queue_length":0,
         "buffer_total_queued_size":0,
         "retry_count":0
      }
lukas-vlcek commented 8 years ago

We might be suffering from default settings for queue sizes: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/modules-threadpool.html

lukas-vlcek commented 8 years ago

We are running into same issue when using rsyslog instead of Fluentd.

For the case:

export NMESSAGES=20
export NPROJECTS=99

We are getting bulk.rejected = 264

lukas-vlcek commented 8 years ago

Actually, as explained in official guide doc if bulk request fails due to rejection then it is not considered an error on ES side but this should be a signal to the client:

"Rejections are not errors: they just mean you should try again later."

Which means we need to check all clients (fluentd and rsyslog ATM) that they handle relevant HTTP response codes correctly. See here, could be good starting point.

lukas-vlcek commented 8 years ago

In case of rsyslog and omelasticsearch we need to configure errorfile (which is optional ATM). rsyslog should write all errored requests to this file (which should cover also rejected bulk requests - but I am not sure if this specific use case is tested in rsyslog code!). My understanding is that there is no mechanism of resubmitting data from this error file. It is responsibility of the user to setup some process to periodically check this file and investigate/decide what to do.

See http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html and https://github.com/rsyslog/rsyslog/issues/104. Also check this https://github.com/rsyslog/rsyslog/pull/246 for possible improvements/changes in newer versions of rsyslog (i.e. depending on version of rsyslog we use we should be able to get better support for resubmitting errored requests).

lukas-vlcek commented 8 years ago

Note that starting with ES 2.2 there should be configurable OOB bulk error retry mechanism that can self-heal from some issues we see today, see https://github.com/elastic/elasticsearch/issues/14620 and https://github.com/elastic/elasticsearch/pull/14829 (see also here https://www.elastic.co/guide/en/elasticsearch/reference/2.3/release-notes-2.2.0.html#enhancement-2.2.0 the "Java API"). However, this seem to be only improvement for short peaks, we still need to watch for cases where ES can not keep up with high bulk indexing traffic.

lukas-vlcek commented 8 years ago

In case of fluentd elasticsearch plugin it seem any sent data associated with following HTTP response code other than 200 is discarded, see https://github.com/uken/fluent-plugin-elasticsearch/issues/105

cosmo0920 commented 6 years ago

This discarding issue is still present in fluent-plugin-elasticsearch v1.11.1 and v2.1.1?

richm commented 6 years ago

This discarding issue is still present in fluent-plugin-elasticsearch v1.11.1 and v2.1.1?

Probably not - we haven't tested with that