confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
127 stars 1.04k forks source link

Implement Server-Sent Events(SSE) , Cross-Origin-Resource-Sharing(CORS) #283

Open xmlking opened 7 years ago

xmlking commented 7 years ago

It would be easy to implement web-client for KSQL-Server if SSE and CORS are implemented. SSE provides better abstraction to represent streaming structured data (JSON) in the browser or server-side (WebClient). CORS helps to deploy KSQL-Server and Web-Clients on separate hosts.

hjafarpour commented 7 years ago

@xmlking indeed we do have a very sample web client implemented in quickstart.html. You can connect to KSQL server from your browser and submit queries through a form. Check it out! It would be great if you can have a simple implementation of what you have in mind as a PR so we can discuss it more!

xmlking commented 7 years ago

@hjafarpour I planning to implement centralized logging solution and use Angular UI app (hosted separately) that connects to KSQL-Servers. Want to allow users to enter hostname and port via setting page, to connect to their own KSQL-Server in the UI app. If things workout, I can make it as generic UI App for KSQL-Server and share with community. image

rmoff commented 6 years ago

@xmlking just catching up on some old issues. Is this still relevant for the latest version (5.0.0-SNAPSHOT)?

xmlking commented 6 years ago

Yes. Streaming over http is ideal for real-time dashboard usecases then Request-response style RPC

rmoff commented 6 years ago

Does the new REST API help this at all? https://docs.confluent.io/current/ksql/docs/api.html

xmlking commented 6 years ago

I see "Transfer-Encoding: chunked" in KSQL docs. I guess KSQL is using HTTP Streaming option for streaming.

That is great but there are couple of alternative options for streaming response over http i.e., HTTP Long Polling , HTTP Streaming and Server-Sent-events and each has some advantages and disadvantages.

SSE has some advantages for KSQL use-cases over other methods.

  1. SSE is non-blocking for for clients. Clients can use RxJS , RxJava and dont wast resources while waiting for next data chunk.
  2. JavaScript has EventSource API for SSE and clients can render/ process data incrementally as data arrive. with HTTP Streaming even data is arriving to browsers in streaming fashion, browsers are blocked until all data is received, before they can use it.
  3. Since HTTP Streaming is hop-to-hop instead of end-to-end, a proxy in between might try to consolidate the chunks before forwarding the response to the client. this is not the case for SSE.

The ask is: implement SSE based streaming solution to make frontend developers build streaming dashboards easily.

Ref: https://tools.ietf.org/html/rfc6202 https://stackoverflow.com/questions/39889052/why-use-server-sent-events-instead-of-simple-http-chunked-streaming

apurvam commented 5 years ago

cc @derekjn @MichaelDrogalis this may be something we want to keep in our minds for the future.

MichaelDrogalis commented 5 years ago

👍 This seems good to tackle when we start to officially cut over to making the REST API the main API.

epinadev commented 5 years ago

Any updates on how to enable CORS so the KSQL REST API can be used from the browser ? For now all I can do is to start chrome with --disable-web-security flag on. I also tried setting up a reverse proxy with Ngnix using Docker, but it works super slow.

MichaelDrogalis commented 5 years ago

@epinadev Hey, nothing yet on our end. I know that @purplefox is doing some experiments which might touch the front-end. Will let him chime in if he bumps into it.

epinadev commented 5 years ago

Thanks @MichaelDrogalis , I finally fixed my docker setup and now is working super well with Nginx in front of the ksql-server API, so CORS is not an issue anymore for now.

MichaelDrogalis commented 5 years ago

That's great!

roneel-kumar commented 4 years ago

Hi @epinadev - we are also working with Nginx in front of ksql-server API. We can happily curl the Nginx end point but currently we are hitting time out errors from the browser (REACT.js). Was there anything specific you needed to configure on Nginx to make it work for you?

epinadev commented 4 years ago

Hi @roneel-kumar Sorry for the 1 month-late response. I get tons of emails and the notification for this thread got lost in the inbox. Here is my docker-compose yml file.

It runs the full package (zookeper, kafka, schema-registry, the data generators, ksql and nginx)

version: '2'
services:
  zookeeper:
    image: "confluentinc/cp-zookeeper:4.1.0"
    container_name: ksql-zookeeper
    hostname: zookeeper
    ports:
      - '32181:32181'
    environment:
      ZOOKEEPER_CLIENT_PORT: 32181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: "confluentinc/cp-enterprise-kafka:4.1.0"
    hostname: kafka
    container_name: ksql-kafka
    ports:
      - '9092:9092'
      - '29092:29092'
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
      KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: kafka:29092
      CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:32181
      CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1
      CONFLUENT_METRICS_ENABLE: 'true'
      CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'

  schema-registry:
    image: "confluentinc/cp-schema-registry:4.1.0"
    hostname: schema-registry
    container_name: ksql-schema-registry
    depends_on:
      - zookeeper
      - kafka
    ports:
      - '8081:8081'
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: zookeeper:32181

  # Runs the Kafka KSQL data generator for topic called "pageviews"
  ksql-datagen-pageviews:
    image: "confluentinc/ksql-examples:4.1.0"
    hostname: ksql-datagen-pageviews
    container_name: ksql-datagen-pageviews
    depends_on:
      - kafka
      - schema-registry
    # Note: The container's `run` script will perform the same readiness checks
    # for Kafka and Confluent Schema Registry, but that's ok because they complete fast.
    # The reason we check for readiness here is that we can insert a sleep time
    # for topic creation before we start the application.
    command: "bash -c 'echo Waiting for Kafka to be ready... && \
                       cub kafka-ready -b kafka:29092 1 20 && \
                       echo Waiting for Confluent Schema Registry to be ready... && \
                       cub sr-ready schema-registry 8081 20 && \
                       echo Waiting a few seconds for topic creation to finish... && \
                       sleep 2 && \
                       java -jar /usr/share/java/ksql-examples/ksql-examples-4.1.1-SNAPSHOT-standalone.jar
                       quickstart=pageviews format=delimited topic=pageviews bootstrap-server=kafka:29092 maxInterval=100'"
    environment:
      KSQL_CONFIG_DIR: "/etc/ksql"
      KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties"
      STREAMS_BOOTSTRAP_SERVERS: kafka:29092
      STREAMS_SCHEMA_REGISTRY_HOST: schema-registry
      STREAMS_SCHEMA_REGISTRY_PORT: 8081

  # Runs the Kafka KSQL data generator for topic called "users"
  ksql-datagen-users:
    image: "confluentinc/ksql-examples:4.1.0"
    hostname: ksql-datagen-users
    container_name: ksql-datagen-users
    depends_on:
      - kafka
      - schema-registry
    # Note: The container's `run` script will perform the same readiness checks
    # for Kafka and Confluent Schema Registry, but that's ok because they complete fast.
    # The reason we check for readiness here is that we can insert a sleep time
    # for topic creation before we start the application.
    command: "bash -c 'echo Waiting for Kafka to be ready... && \
                       cub kafka-ready -b kafka:29092 1 20 && \
                       echo Waiting for Confluent Schema Registry to be ready... && \
                       cub sr-ready schema-registry 8081 20 && \
                       echo Waiting a few seconds for topic creation to finish... && \
                       sleep 2 && \
                       java -jar /usr/share/java/ksql-examples/ksql-examples-4.1.1-SNAPSHOT-standalone.jar
                       quickstart=users format=json topic=users bootstrap-server=kafka:29092 maxInterval=100'"
    environment:
      KSQL_CONFIG_DIR: "/etc/ksql"
      KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties"
      STREAMS_BOOTSTRAP_SERVERS: kafka:29092
      STREAMS_SCHEMA_REGISTRY_HOST: schema-registry
      STREAMS_SCHEMA_REGISTRY_PORT: 8081

  # Runs the Kafka KSQL Server
  ksql-server:
    image: "confluentinc/ksql-cli:4.1.0"
    hostname: ksql-server
    container_name: ksql-server
    ports:
      - '8088:8088'
    depends_on:
      - kafka
      - schema-registry
    # Note: The container's `run` script will perform the same readiness checks
    # for Kafka and Confluent Schema Registry, but that's ok because they complete fast.
    # The reason we check for readiness here is that we can insert a sleep time
    # for topic creation before we start the application.
    command: "bash -c 'echo Waiting for Kafka to be ready... && \
                       cub kafka-ready -b kafka:29092 1 20 && \
                       echo Waiting for Confluent Schema Registry to be ready... && \
                       cub sr-ready schema-registry 8081 20 && \
                       echo Waiting a few seconds for topic creation to finish... && \
                       sleep 2 && \
                       /usr/bin/ksql-server-start /etc/ksql/ksql-server.properties'"
    environment:
      KSQL_CONFIG_DIR: "/etc/ksql"
      KSQL_OPTS: "-Dbootstrap.servers=kafka:29092 -Dksql.schema.registry.url=http://schema-registry:8081 -Dlisteners=http://0.0.0.0:8088"
      KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties"

  ksql-nginx:
    image: nginx:latest
    container_name: ksql-nginx
    ports:
      - 9098:80
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - ksql-server
    restart: always

And this is the ngnix.conf

worker_processes auto;

events { 
    worker_connections 1024; 
}

http {

    log_format compression '$remote_addr - $remote_user [$time_local] '
        '"$request" $status $upstream_addr '
        '"$http_referer" "$http_user_agent" "$gzip_ratio"';

    upstream ksql {
        server ksql-server:8088;
    }

    server {
        listen 0.0.0.0:80 reuseport;
        server_name localhost;

        location / {
         if ($request_method = 'OPTIONS') {
            add_header 'Access-Control-Allow-Origin' '*';
            add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
            #
            # Custom headers and headers various browsers *should* be OK with but aren't
            #
            add_header 'Access-Control-Allow-Headers' 'Authorization,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range';
            #
            # Tell client that this pre-flight info is valid for 20 days
            #
            add_header 'Access-Control-Max-Age' 1728000;
            add_header 'Content-Type' 'text/plain charset=UTF-8';
            add_header 'Content-Length' 0;
            return 204;
         }
         if ($request_method = 'POST') {
            add_header 'Access-Control-Allow-Origin' '*';
            add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
            add_header 'Access-Control-Allow-Headers' 'Authorization,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range';
            add_header 'Access-Control-Expose-Headers' 'Authorization,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range';
         }
         if ($request_method = 'GET') {
            add_header 'Access-Control-Allow-Origin' '*';
            add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
            add_header 'Access-Control-Allow-Headers' 'Authorization,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range';
            add_header 'Access-Control-Expose-Headers' 'Authorization,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range';
         }
            proxy_pass         http://ksql;
            proxy_connect_timeout   2;
            proxy_redirect off;
            proxy_buffering off;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }
    }
}
roneel-kumar commented 4 years ago

Thank you so much.

On Sat, 4 Jul 2020 at 3:32 AM, Eduardo Piña Fonseca < notifications@github.com> wrote:

Hi @roneel-kumar https://github.com/roneel-kumar Sorry for the 1 month-late response. I get tons of emails and the notification for this thread got lost in the inbox. Here is my docker-compose yml file.

It runs the full package (zookeper, kafka, schema-registry, the data generators, ksql and nginx)

version: '2' services: zookeeper: image: "confluentinc/cp-zookeeper:4.1.0" container_name: ksql-zookeeper hostname: zookeeper ports:

  • '32181:32181' environment: ZOOKEEPER_CLIENT_PORT: 32181 ZOOKEEPER_TICK_TIME: 2000

    kafka: image: "confluentinc/cp-enterprise-kafka:4.1.0" hostname: kafka container_name: ksql-kafka ports:

  • '9092:9092'
  • '29092:29092' depends_on:
  • zookeeper environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: zookeeper:32181 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092 KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true" KAFKA_METRIC_REPORTERS: io.confluent.metrics.reporter.ConfluentMetricsReporter KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 CONFLUENT_METRICS_REPORTER_BOOTSTRAP_SERVERS: kafka:29092 CONFLUENT_METRICS_REPORTER_ZOOKEEPER_CONNECT: zookeeper:32181 CONFLUENT_METRICS_REPORTER_TOPIC_REPLICAS: 1 CONFLUENT_METRICS_ENABLE: 'true' CONFLUENT_SUPPORT_CUSTOMER_ID: 'anonymous'

    schema-registry: image: "confluentinc/cp-schema-registry:4.1.0" hostname: schema-registry container_name: ksql-schema-registry depends_on:

  • zookeeper
  • kafka ports:
  • '8081:8081' environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: zookeeper:32181

    Runs the Kafka KSQL data generator for topic called "pageviews"

    ksql-datagen-pageviews: image: "confluentinc/ksql-examples:4.1.0" hostname: ksql-datagen-pageviews container_name: ksql-datagen-pageviews depends_on:

  • kafka
  • schema-registry

    Note: The container's run script will perform the same readiness checks

    for Kafka and Confluent Schema Registry, but that's ok because they complete fast.

    The reason we check for readiness here is that we can insert a sleep time

    for topic creation before we start the application.

    command: "bash -c 'echo Waiting for Kafka to be ready... && \ cub kafka-ready -b kafka:29092 1 20 && \ echo Waiting for Confluent Schema Registry to be ready... && \ cub sr-ready schema-registry 8081 20 && \ echo Waiting a few seconds for topic creation to finish... && \ sleep 2 && \ java -jar /usr/share/java/ksql-examples/ksql-examples-4.1.1-SNAPSHOT-standalone.jar quickstart=pageviews format=delimited topic=pageviews bootstrap-server=kafka:29092 maxInterval=100'" environment: KSQL_CONFIG_DIR: "/etc/ksql" KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties" STREAMS_BOOTSTRAP_SERVERS: kafka:29092 STREAMS_SCHEMA_REGISTRY_HOST: schema-registry STREAMS_SCHEMA_REGISTRY_PORT: 8081

    Runs the Kafka KSQL data generator for topic called "users"

    ksql-datagen-users: image: "confluentinc/ksql-examples:4.1.0" hostname: ksql-datagen-users container_name: ksql-datagen-users depends_on:

  • kafka
  • schema-registry

    Note: The container's run script will perform the same readiness checks

    for Kafka and Confluent Schema Registry, but that's ok because they complete fast.

    The reason we check for readiness here is that we can insert a sleep time

    for topic creation before we start the application.

    command: "bash -c 'echo Waiting for Kafka to be ready... && \ cub kafka-ready -b kafka:29092 1 20 && \ echo Waiting for Confluent Schema Registry to be ready... && \ cub sr-ready schema-registry 8081 20 && \ echo Waiting a few seconds for topic creation to finish... && \ sleep 2 && \ java -jar /usr/share/java/ksql-examples/ksql-examples-4.1.1-SNAPSHOT-standalone.jar quickstart=users format=json topic=users bootstrap-server=kafka:29092 maxInterval=100'" environment: KSQL_CONFIG_DIR: "/etc/ksql" KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties" STREAMS_BOOTSTRAP_SERVERS: kafka:29092 STREAMS_SCHEMA_REGISTRY_HOST: schema-registry STREAMS_SCHEMA_REGISTRY_PORT: 8081

    Runs the Kafka KSQL Server

    ksql-server: image: "confluentinc/ksql-cli:4.1.0" hostname: ksql-server container_name: ksql-server ports:

  • '8088:8088' depends_on:
  • kafka
  • schema-registry

    Note: The container's run script will perform the same readiness checks

    for Kafka and Confluent Schema Registry, but that's ok because they complete fast.

    The reason we check for readiness here is that we can insert a sleep time

    for topic creation before we start the application.

    command: "bash -c 'echo Waiting for Kafka to be ready... && \ cub kafka-ready -b kafka:29092 1 20 && \ echo Waiting for Confluent Schema Registry to be ready... && \ cub sr-ready schema-registry 8081 20 && \ echo Waiting a few seconds for topic creation to finish... && \ sleep 2 && \ /usr/bin/ksql-server-start /etc/ksql/ksql-server.properties'" environment: KSQL_CONFIG_DIR: "/etc/ksql" KSQL_OPTS: "-Dbootstrap.servers=kafka:29092 -Dksql.schema.registry.url=http://schema-registry:8081 -Dlisteners=http://0.0.0.0:8088" KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksql/log4j-rolling.properties"

    ksql-nginx: image: nginx:latest container_name: ksql-nginx ports:

  • 9098:80 volumes:
  • ./nginx.conf:/etc/nginx/nginx.conf depends_on:
  • ksql-server restart: always

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/confluentinc/ksql/issues/283#issuecomment-653598490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBVK65QF2OPZ4MA3RGZNRLRZX2ZRANCNFSM4DZWLCLA .

-- Regards Roneel Kumar

Consulting Director | +64 21 244 5112 <+64212445112> roneel@cognitiv.co.nz roneel@cognitiv.co.nz| www.cognitiv.co.nz

Technology for People's Sake

The contents of this email and any attachments are confidential. They are intended for the named recipient(s) only. If you have received this email by mistake, please notify the sender immediately, do not disclose the contents to anyone or make copies.

ShanonJackson commented 4 years ago

Thought i'd share information here because i have doubts in anyone achieving a real-time dashboard using KSQL Rest with a CORS avoiding proxy.

Heres my own experience after going down this path:

Chrome caps concurrent http connections to a specific "host" at 6. This restriction alone prevents anyone setting up a proxy with cors to ksql rest directly.

What this means is because KSQL Rest uses 'transfer-encoding: chunked' to send data, is that you can only connect 6 of these at the same time before all future queries go to pending and the server appears to cease up (even though its just chrome blocking).

You're going to need to throw a websocket somewhere into the mix so you can stream all data through a single connection