RedPillAnalytics / gradle-confluent

Apache License 2.0
22 stars 10 forks source link

JsonSyntaxException for drop and terminate statements. #108

Closed bhanurz closed 3 years ago

bhanurz commented 3 years ago

Describe the bug When drop and terminate statements are being executed, pipeline fails with JsonSyntaxException. This is more frequent when there are large number of queries running.

To Reproduce Steps to reproduce the behavior:

  1. Setup local ksqlDB with memory around 2 GB and 2 CPU cores.
  2. Deploy 100+ queries.
  3. Drop these queries using --drop-only flag.
  4. The plugin has a good chance of failing with JsonSyntaxException in this situation. If not, please increase query count a bit and try again.

Expected behavior The plugin should execute the drop and terminate statement without throwing JsonSyntaxException.

Additional context ksqlDB version: 0.14.0

What I found: When the plugin tries to get command status after executing a statement, it is possible that ksqlDB returns following: Response: \<html>\<body>\<h1>Resource not found\<\/h1>\<\/body>\<\/html> In this case it is not able to parse the response body as a JSON and fails at this point:

This is an invalid response from ksqlDB and we can also raise the issue with them. Raising the issue here as well so the plugin can work for current versions of ksqlDB.

Stack trace:

Terminating query [REDACTED]... unirest response: <kong.unirest.StringResponse@762a3c52 body=[{"@type":"currentStatus","statementText":"TERMINATE [REDACTED];","commandId":"terminate/[REDACTED]/execute","commandStatus":{"status":"EXECUTING","message":"Executing statement","queryId":null},"commandSequenceNumber":1568,"warnings":[]}] headers=content-type: application/json content-length: 277 statusText=OK statusCode=200 parsingerror=Optional.empty config=kong.unirest.Config@12d9671c cookies=null> DROP TABLE IF EXISTS "[REDACTED]"; unirest response: <kong.unirest.StringResponse@3347a01e body={"@type":"statement_error","error_code":40001,"message":"Cannot drop [REDACTED].\nThe following queries read from this source: [].\nThe following queries write into this source: [[REDACTED]].\nYou need to terminate them before dropping [REDACTED].","statementText":"DROP TABLE IF EXISTS \"[REDACTED]\";","entities":[]} headers=content-type: application/json content-length: 364 statusText=Bad Request statusCode=400 parsingerror=Optional.empty config=kong.unirest.Config@12d9671c cookies=null> Watching 82 directories to track changes

Task :pipelineExecute FAILED :pipelineExecute (Thread[Execution worker for ':',5,main]) completed. Took 1 mins 45.801 secs.

FAILURE: Build failed with an exception.

stewartbryson commented 3 years ago

Is it just a timing issue... the response will eventually come back if we WHILE and continuing checking for it?

bhanurz commented 3 years ago

@stewartbryson I had the same thoughts. I am trying to do add a 'while' in getCommandStatus. Will raise a PR if this works.

sebrouss commented 3 years ago

Yes I agree. We are facing the same issue, and the underlying error that we have when querying KSQL is this : [status:500, statusText:Internal Server Error, body:[@type:statement_error, error_code:50000, message:Could not write the statement 'DROP TABLE IF EXISTS CMAS_ROLEPART_MERGED_PART_BY_NOPAR;' into the command topic. Caused by: Timeout while waiting for command topic consumer to process command

Here KSQL is quite busy and fails to process the request fast enough, a controlled retry mechanism would fix it .

bhanurz commented 3 years ago

Looks like in this case, it is not able to get the command ID. I logged the command ID, it shows it was null:

Command id: null.

That's why KSQL returns 'Resource not found' as html because the API call looks like: /status/null

I will try to add a retry mechanism where if the command id is null, the query is retried until we get a valid command id.

stewartbryson commented 3 years ago

Looks like in this case, it is not able to get the command ID. I logged the command ID, it shows it was null:

Command id: null.

That's why KSQL returns 'Resource not found' as html because the API call looks like: /status/null

I will try to add a retry mechanism where if the command id is null, the query is retried until we get a valid command id.

Unfortunately, I won't be able to look at this until Friday, but I will then. Let me know how you get on in the meantime.

bhanurz commented 3 years ago

I was able to add a retry mechanism, this is working. Can be improved by adding maxRetries.

def dropKsql(String ksql, Map properties) {
    def result

    do {
       result = execKsql(ksql, properties)
       log.debug "result: ${result}"

       if (result.status == 400 && result.body.message.contains('Incompatible data source type is STREAM')) {
          log.info "Type is now STREAM. Issuing DROP STREAM..."
          result = execKsql(ksql.replace('TABLE', 'STREAM'), properties)
       }

       if (result.status == 400 && result.body.message.contains('Incompatible data source type is TABLE')) {
          log.info "Type is now TABLE. Issuing DROP TABLE..."
          result = execKsql(ksql.replace('STREAM', 'TABLE'), properties)
       }

       if (result.body.commandId == null) {
          log.info "Command id is null. Pausing for $COMMAND_ID_WAIT_SECONDS seconds before retrying."
          sleep(COMMAND_ID_WAIT_SECONDS * 1000)
       }
    } while (result.body.commandId == null)

    while (['QUEUED', 'PARSING', 'EXECUTING'].contains(getCommandStatus(result.body.commandId))) {
       log.info "Command ${result.body.commandId} still pending..."
    }
    log.debug "final result: ${result}"
    return result
 }
sebrouss commented 3 years ago

Looks good ! I would add this maxRetries to avoid looping indefinetly on that

sebrouss commented 3 years ago

Any update on the solution provided by @bhanurz ? Do you have any plan to create a PR to provide this capability in a 1.3.1 version ?

bhanurz commented 3 years ago

@sebrouss We have added the changes to our system. Will use this for a week and check if there are no issues and raise a PR after that.

stewartbryson commented 3 years ago

Great. Go ahead and raise he PR if you don’t mind so I can have a look at it. Thanks.

bhanurz commented 3 years ago

@stewartbryson Created the PR, please take a look when possible: https://github.com/RedPillAnalytics/gradle-confluent/pull/113

stewartbryson commented 3 years ago

@bhanurz Your changes are in release 1.3.3. Please test and, if successful, close this ticket. Thank you SO much for your contribution.

stewartbryson commented 3 years ago

@bhanurz Let me know when you have tested the latest release, please. Thanks.

stewartbryson commented 3 years ago

@sebrouss We have added the changes to our system. Will use this for a week and check if there are no issues and raise a PR after that.

@sebrouss Can you test the latest release as well?

bhanurz commented 3 years ago

@stewartbryson Tested release 1.3.3. Working as expected, closing the issue. @sebrouss Please reopen if you are still facing the issue.

sebrouss commented 3 years ago

hey Guyz,

I tested it, and it works fine ! Cheers