elasticio / sailor-jvm

The official elastic.io library for bootstrapping and executing for connectors built on JVM.
0 stars 2 forks source link

Publish exceptions under load #63

Closed zuker closed 3 years ago

zuker commented 3 years ago

Prerequisites: Webhook request-reply trigger: component.json:

{
  "title": "Request Reply Trigger (Java)",
  "description": "elastic.io component for the Petstore API",
  "service": "request-reply",
  "buildType": "docker",
  "triggers": {
    "reply": {
      "main": "io.elastic.reply.triggers.Reply",
      "type": "webhook",
      "title": "Reply",
      "metadata": {}
    }
  }
}

io.elastic.reply.triggers.Reply:

package io.elastic.reply.triggers;

import io.elastic.api.ExecutionParameters;
import io.elastic.api.Function;
import io.elastic.api.Message;
import io.elastic.api.HttpReply;
import io.elastic.api.JSON;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.json.Json;
import javax.json.JsonArray;
import javax.json.JsonObject;
import javax.json.JsonString;

import java.io.ByteArrayInputStream;

public class Reply implements Function {
    private static final Logger logger = LoggerFactory.getLogger(Reply.class);

    /**
     * @param parameters execution parameters
     */
    @Override
    public void execute(final ExecutionParameters parameters) {
        final JsonObject body = Json.createObjectBuilder()
                .add("echo", parameters.getMessage().getBody())
                .build();

        final Message data
                = new Message.Builder().body(body).build();

        final HttpReply httpReply = new HttpReply.Builder()
                .header("Content-type", "application/json")
                .status(HttpReply.Status.ACCEPTED)
                .content(new ByteArrayInputStream(JSON.stringify(body).getBytes()))
                .build();

        logger.info("Emitting data");

        // emitting the message to the platform
        parameters.getEventEmitter().emitData(data);

        logger.info("Emitting reply");
        parameters.getEventEmitter().emitHttpReply(httpReply);
    }
}

STR:

  1. Create realtime flow with just one trigger (code above) with "Parallel processing": 99
  2. Start it and run several times sequentially ab -k -c 50 -n 100 "FLOW_WEBHOOK_URI"

Expected result: all ab runs is successful

Actual result: flow becomes unresponsive, all requests timed out until flow restart Some logs:

12 Feb, 2021 14:33:53.402 Component execution failed
12 Feb, 2021 14:33:53.400 Adding message of step 'step_1' into pass-through
arrow_right
12 Feb, 2021 14:33:53.400 Message headers: {x-eio-meta-webhook-publish-timestamp=1613133233291, stepId=step_1, start=1613133233397, messageId=13559970-2808-4f0f-b25d-f5504194a5c7, userId=5773e8e26e05f10500000003, execId=21ca5891a0be497890ce35ac728061d3, threadId=0f4beeee-2918-4f86-9175-135d3cb69a2e, reply_to=request_reply_key_60266eb96778e200111f268c_775a1a13-ab8c-4c89-8bdb-25001d5e8fd4, compId=6026665ce9e0db001179b089, function=reply, parentMessageId=54930784-607b-44d3-8ab2-1ee5680afa9d, x-eio-meta-trace-id=0f4beeee-2918-4f86-9175-135d3cb69a2e, x-eio-meta-reply-tag=cf45926d3724dda447d840316d8db282deca16c0309521e271013fbfa60eeeb1, containerId=e269c9f2-e23d-411b-a40d-3483dbec8db3, taskId=60266eb96778e200111f268c, workspaceId=5c48724304c45b00104dcc7e}
arrow_right
12 Feb, 2021 14:33:53.400 Pushing to exchange=5c48724304c45b00104dcc7e_org, routingKey=5c48724304c45b00104dcc7e.60266eb96778e200111f268c/long_running.step_1.message
12 Feb, 2021 14:33:53.399 Step produced data to be published
12 Feb, 2021 14:33:53.397 Emitting data

Actually under the "Component execution failed" there's following exception:

java.lang.NullPointerException: null
    at io.elastic.sailor.impl.MessagePublisherImpl.createPublishChannel(MessagePublisherImpl.java:142)
    at io.elastic.sailor.impl.MessagePublisherImpl.getPublishChannel(MessagePublisherImpl.java:132)
    at io.elastic.sailor.impl.MessagePublisherImpl.publish(MessagePublisherImpl.java:53)
    at io.elastic.sailor.impl.ErrorPublisherImpl.publish(ErrorPublisherImpl.java:59)
    at io.elastic.sailor.impl.ErrorCallback.receiveData(ErrorCallback.java:23)
    at io.elastic.sailor.impl.CountingCallbackImpl.receive(CountingCallbackImpl.java:13)
    at io.elastic.api.EventEmitter.emit(EventEmitter.java:119)
    at io.elastic.api.EventEmitter.emitException(EventEmitter.java:53)
    at io.elastic.sailor.impl.MessageProcessorImpl.processMessage(MessageProcessorImpl.java:71)
    at io.elastic.sailor.impl.MessageConsumer.handleDelivery(MessageConsumer.java:63)
    at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:149)
    at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:104)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Suggestion: Logs of successful processings are full of Opened publish channel, so actually channel open (https://github.com/elasticio/sailor-jvm/blob/06a842c8ebc671fd7cd9f2441c9f68cac274b5dc/src/main/java/io/elastic/sailor/impl/MessagePublisherImpl.java#L142) happens everytime here https://github.com/elasticio/sailor-jvm/blob/06a842c8ebc671fd7cd9f2441c9f68cac274b5dc/src/main/java/io/elastic/sailor/impl/MessagePublisherImpl.java#L131 for every message publish and connection fails with "Error: No channels left to allocate" (experienced this in platform services)

A3a3e1 commented 3 years ago

I've just run this component with Sailor 3.3.2 for higher load:

ab -k -c 99 -n 1000 https://in.elastic.io/hook/602695db133512001172b02f

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking in.elastic.io (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software:        
Server Hostname:        in.elastic.io
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        ECDH X25519 253 bits
TLS Server Name:        in.elastic.io

Document Path:          /hook/602695db133512001172b02f
Document Length:        11 bytes

Concurrency Level:      99
Time taken for tests:   20.794 seconds
Complete requests:      1000
Failed requests:        0
Keep-Alive requests:    1000
Total transferred:      389000 bytes
HTML transferred:       11000 bytes
Requests per second:    48.09 [#/sec] (mean)
Time per request:       2058.601 [ms] (mean)
Time per request:       20.794 [ms] (mean, across all concurrent requests)
Transfer rate:          18.27 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   70 217.2      0     924
Processing:   134 1875 616.4   1619    3827
Waiting:      134 1875 616.4   1619    3827
Total:        137 1945 686.3   1642    4660

Percentage of the requests served within a certain time (ms)
  50%   1642
  66%   1912
  75%   2056
  80%   2218
  90%   3030
  95%   3538
  98%   4014
  99%   4412
 100%   4660 (longest request)
A3a3e1 commented 3 years ago

Ok. The next run of the same command produced errors you mentioned above, while the first one worked fine

zuker commented 3 years ago

@A3a3e1 just reproduced the same

A3a3e1 commented 3 years ago

1023 is the hard limit currently. The flow is able to process exactly 1023 messages after flow restart. Then it stucks image

zuker commented 3 years ago

@A3a3e1 looks like amqp connection channel limit

A3a3e1 commented 3 years ago

Confirm, 3.0.0 Sailor has no this bug:

ab -k -c 99 -n 3000 https://in.elastic.io/hook/602695db133512001172b02f

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking in.elastic.io (be patient)
Completed 300 requests
Completed 600 requests
Completed 900 requests
Completed 1200 requests
Completed 1500 requests
Completed 1800 requests
Completed 2100 requests
Completed 2400 requests
Completed 2700 requests
Completed 3000 requests
Finished 3000 requests

Server Software:        
Server Hostname:        in.elastic.io
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        ECDH X25519 253 bits
TLS Server Name:        in.elastic.io

Document Path:          /hook/602695db133512001172b02f
Document Length:        11 bytes

Concurrency Level:      99
Time taken for tests:   13.324 seconds
Complete requests:      3000
Failed requests:        0
Keep-Alive requests:    3000
Total transferred:      1167000 bytes
HTML transferred:       33000 bytes
Requests per second:    225.16 [#/sec] (mean)
Time per request:       439.683 [ms] (mean)
Time per request:       4.441 [ms] (mean, across all concurrent requests)
Transfer rate:          85.54 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   21 116.6      0     957
Processing:    45  405 132.4    385     723
Waiting:       45  405 132.4    385     723
Total:         45  425 183.6    389    1611

Percentage of the requests served within a certain time (ms)
  50%    389
  66%    479
  75%    520
  80%    573
  90%    604
  95%    645
  98%   1028
  99%   1252
 100%   1611 (longest request)
ShkarupaNick commented 3 years ago

The task is ready for testing: Sailor version: 3.3.3-SNAPSHOT